Commit Graph

408 Commits

Author SHA1 Message Date
Arnaud Cogoluègnes 63d1ed9dfb Add meck to SAC coordinator test dependencies
And squash a dialyzer warning.
2023-04-04 19:32:57 +04:00
Arnaud Cogoluègnes 70538c587f Unblock group of consumers on super stream partition
A group of consumers on a super stream can end up blocked
without an active consumer. This can happen with consumer
churn: one consumer gets removed, which makes the active
consumer passive, but the former active consumer never
gets to know because it has been removed itself.

This commit changes the structure of the messages the SAC
coordinator sends to consumer connections, to embed enough
information to look up the group and to instruct it to choose
a new active consumer when the race condition mentioned above
comes up.

Because of the changes in the structure of messages, a feature
flag is required to make sure the SAC coordinator starts
sending the new messages only when all the nodes have been upgraded.

References #7743
2023-04-04 19:32:57 +04:00
Arnaud Cogoluègnes e9f2fa41ce
Take credits for inactive stream subscription
Not taking the credits can starve the subscription,
making it permanently under its credit send limit.

The subscription then never dispatches messages when
it becomes active again.

This happens in an active-inactive-active cycle, especially
with slow consumers.
2023-03-16 11:59:58 +01:00
Michael Klishin d0dc951343
Merge pull request #7058 from rabbitmq/add-node-lists-functions-to-clarify-intent
rabbit_nodes: Add list functions to clarify which nodes we are interested in
2023-02-13 23:06:50 -03:00
Michael Klishin 054381a99b
Merge pull request #7269 from rabbitmq/ff-stream_queue
Remove compatibility for feature flag stream_queue
2023-02-13 22:35:08 -03:00
David Ansari 575f4e78bc Remove compatibility for feature flag stream_queue
Remove compatibility code for feature flag `stream_queue`
because this feature flag is required in 3.12.

See #7219
2023-02-13 15:31:40 +00:00
Jean-Sébastien Pédron d65637190a
rabbit_nodes: Add list functions to clarify which nodes we are interested in
So far, we had the following functions to list nodes in a RabbitMQ
cluster:
* `rabbit_mnesia:cluster_nodes/1` to get members of the Mnesia cluster;
  the argument was used to select members (all members or only those
  running Mnesia and participating in the cluster)
* `rabbit_nodes:all/0` to get all members of the Mnesia cluster
* `rabbit_nodes:all_running/0` to get all members who currently run
  Mnesia

Basically:
* `rabbit_nodes:all/0` calls `rabbit_mnesia:cluster_nodes(all)`
* `rabbit_nodes:all_running/0` calls `rabbit_mnesia:cluster_nodes(running)`

We also have:
* `rabbit_node_monitor:alive_nodes/1` which filters the given list of
  nodes to only select those currently running Mnesia
* `rabbit_node_monitor:alive_rabbit_nodes/1` which filters the given
  list of nodes to only select those currently running RabbitMQ

Most of the code uses `rabbit_mnesia:cluster_nodes/1` or the
`rabbit_nodes:all*/0` functions. `rabbit_mnesia:cluster_nodes(running)`
or `rabbit_nodes:all_running/0` is often used as a close approximation
of "all cluster members running RabbitMQ". This list might be incorrect
in times where a node is joining the clustered or is being worked on
(i.e. Mnesia is running but not RabbitMQ).

With Khepri, there won't be the same possible approximation because we
will try to keep Khepri/Ra running even if RabbitMQ is stopped to
expand/shrink the cluster.

So in order to clarify what we want when we query a list of nodes, this
patch introduces the following functions:
* `rabbit_nodes:list_members/0` to get all cluster members, regardless
  of their state
* `rabbit_nodes:list_reachable/0` to get all cluster members we can
  reach using Erlang distribution, regardless of the state of RabbitMQ
* `rabbit_nodes:list_running/0` to get all cluster members who run
  RabbitMQ, regardless of the maintenance state
* `rabbit_nodes:list_serving/0` to get all cluster members who run
  RabbitMQ and are accepting clients

In addition to the list functions, there are the corresponding
`rabbit_nodes:is_*(Node)` checks and `rabbit_nodes:filter_*(Nodes)`
filtering functions.

The code is modified to use these new functions. One possible
significant change is that the new list functions will perform RPC calls
to query the nodes' state, unlike `rabbit_mnesia:cluster_nodes(running)`.
2023-02-13 12:58:40 +01:00
David Ansari 2f76102603 Remove compatibility for flag stream_single_active_consumer
Remove compatibility code for feature flag `stream_single_active_consumer`
because this feature flag is `required` in 3.12.

See https://github.com/rabbitmq/rabbitmq-server/pull/7219
2023-02-13 11:52:25 +00:00
David Ansari 79c12b60bc Use maybe expression instead of messy patterns
This commit is pure refactoring making the code base more maintainable.

Replace rabbit_misc:pipeline/3 with the new OTP 25 experimental maybe
expression because
"Frequent ways in which people work with sequences of failable
operations include folds over lists of functions, and abusing list
comprehensions. Both patterns have heavy weaknesses that makes them less
than ideal."
https://www.erlang.org/eeps/eep-0049#obsoleting-messy-patterns

Additionally, this commit is more restrictive in the type spec of
rabbit_mqtt_processor state fields.
Specifically, many fields were defined to be `undefined | T` where
`undefined` was only temporarily until the first CONNECT packet was
processed by the processor.
It's better to initialise the MQTT processor upon first CONNECT packet
because there is no point in having a processor without having received
any packet.
This allows many type specs in the processor to change from `undefined |
T` to just `T`.
Additionally, memory is saved by removing the `received_connect_packet`
field from the `rabbit_mqtt_reader` and `rabbit_web_mqtt_handler`.
2023-02-07 16:36:08 +01:00
Arnaud Cogoluègnes f558d80e3d
Activate stream delivery v2 correctly
As soon as v2 is supported, whatever the min supported
version is.
2023-01-31 16:49:48 +01:00
David Ansari 1f106fcd98 Fix wrong and add missing type specs 2023-01-25 17:13:54 +00:00
David Ansari 9c2f5975ea Support tracing in Native MQTT 2023-01-24 17:32:59 +00:00
David Ansari a8b69b43c1 Fix dialyzer issues and add function specs
Fix all dialyzer warnings in rabbitmq_mqtt and rabbitmq_web_mqtt.

Add more function specs.
2023-01-24 17:32:58 +00:00
David Ansari 56e97a9142 Fix MQTT in management plugin
1. Allow to inspect an (web) MQTT connection.
2. Show MQTT client ID on connection page as part of client_properties.
3. Handle force_event_refresh (when management_plugin gets enabled
   after (web) MQTT connections got created).
4. Reduce code duplication between protocol readers.
5. Display '?' instead of 'NaN' in UI for absent queue metrics.
6. Allow an (web) MQTT connection to be closed via management_plugin.

For 6. this commit takes the same approach as already done for the stream
plugin:
The stream plugin registers neither with {type, network} nor {type,
direct}.
We cannot use gen_server:call/3 anymore to close the connection
because the web MQTT connection cannot handle gen_server calls (only
casts).
Strictly speaking, this commit requires a feature flag to allow to force
closing stream connections from the management plugin during a rolling
update. However, given that this is rather an edge case, and there is a
workaround (connect to the node directly hosting the stream connection),
this commit will not introduce a new feature flag.
2023-01-24 17:30:10 +00:00
Rin Kuryloski 08d641a1a9 Fix dialyzer warnings revealed from previous commit 2023-01-19 17:29:29 +01:00
Michal Kuratczyk b8691b720b
Merge pull request #6862 from rabbitmq/small-chunks-opts
Move nopush to reader to try to better make use of packets
2023-01-19 09:01:24 +01:00
Karl Nilsson 83880154de Streams: Move nopush to reader to try to combine small chunks into larger IP packets.
Also change consumer credit top-ups to delay calling send_chunks until there is a "batch"
of credit to consume. Most clients at the time of writing send single credit updates after receiving each chunk so here we won't enter the send loop unless there are more than half the initial credits available.

osiris v1.4.3
2023-01-17 14:01:42 +00:00
Alexey Lebedeff 2c4e4fb691 Fix all dialyzer warnings in rabbitmq_stream
There are some elixir-related messages about undefined functions, but
they don't produce warnings (yet).
2023-01-16 17:11:24 +01:00
Arnaud Cogoluègnes d2443b5f49
Filter out nodes in maintenance in stream metadata
We don't want clients to try to connect to those.

Fixes #3370
2023-01-04 15:54:20 +01:00
Arnaud Cogoluègnes 79990e8eae
Format stream plugin code 2023-01-04 10:47:16 +01:00
Michael Klishin ec4f1dba7d
(c) year bump: 2022 => 2023 2023-01-01 23:17:36 -05:00
Karl Nilsson e4921503db Streams: close osiris log on unsubscribe
To ensure we clean up any log resources such as counter records.
2022-11-03 16:54:31 +00:00
Luke Bakken 7fe159edef
Yolo-replace format strings
Replaces `~s` and `~p` with their unicode-friendly counterparts.

```
git ls-files *.erl | xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g
```
2022-10-10 10:32:03 +04:00
Michael Klishin 796902a5a5
Merge pull request #5961 from rabbitmq/rabbitmq-server-5956-stream-route-partitions-response-prefix
Fix keys for route and partitions responses
2022-10-03 17:32:56 +04:00
Arnaud Cogoluègnes c33e4ae3c6
Use leader locator "balanced" by default for add_super_stream
For the CLI, otherwise all the created individual streams end
up with their leader on the same node, whichi is not what we
want. "balanced" is a reasonable default and it can be overridden
anyway.
2022-10-03 11:53:06 +02:00
Arnaud Cogoluègnes 0c1eeab92b
Fix keys for route and partitions responses
These 2 responses were not using the `rabbit_stream_core` utilities
so they were not using the "prefix" for responses.

Fixes #5956
2022-10-03 09:42:40 +02:00
Michal Kuratczyk 2855278034
Migrate from supervisor2 to supervisor 2022-09-27 13:53:06 +02:00
Arnaud Cogoluègnes f92b2a3d7d
Complement rabbitmq-stream.8 with SAC and super streams 2022-09-23 09:57:42 +02:00
Ayanda Dube 767f0c9e4a Rename and fix rabbit_stream_metrics_gc childspec identifier from supervisor context 2022-09-12 23:32:01 +01:00
David Ansari b953b0f10e Stop sending stats to rabbit_event
Stop sending connection_stats from protocol readers to rabbit_event.
Stop sending queue_stats from queues to rabbit_event.
Sending these stats every 5 seconds to the event manager process is
superfluous because noone handles these events.

They seem to be a relict from before rabbit_core_metrics ETS tables got
introduced in 2016.

Delete test head_message_timestamp_statistics because it tests that
head_message_timestamp is set correctly in queue_stats events
although queue_stats events are used nowhere.
The functionality of head_message_timestamp itself is still tested in
deps/rabbit/test/priority_queue_SUITE.erl and
deps/rabbit/test/temp/head_message_timestamp_tests.py
2022-09-09 10:52:38 +00:00
Arnaud Cogoluègnes 20216ef3c1
Remove experimental warning in super stream commands 2022-09-06 15:42:12 +02:00
Arnaud Cogoluègnes 26b4e6fa41
Use atom_to_binary/1 instead of rabbit_data_coercion 2022-08-08 18:09:58 +02:00
Arnaud Cogoluègnes 93c33f2423
Rename StreamInfo to StreamStats
Other changes: returns a map of int64, use the new osiris:get_stats/1 API.

References #5412
2022-08-08 18:02:51 +02:00
Arnaud Cogoluègnes f223845d43
Keep stream_* return codes
To keep compatibility with the Erlang client's users.

References #5412
2022-08-03 16:46:22 +02:00
Arnaud Cogoluègnes e587e9a8ef
Make process liveness check remote if necessary
References #5412
2022-08-03 15:52:38 +02:00
Arnaud Cogoluègnes 8687e73c7e
Add StreamInfo command to stream protocol
It returns general information on a stream, the first
and committed offsets for now.

Fixes #5412
2022-08-03 14:38:45 +02:00
Jean-Sébastien Pédron dd6b52d3f2
rabbit_stream: Use rabbit_feature_flags, not `rabbit_ff_registry`
`rabbit_ff_registry` is an internal module of the feature flags
subsystem, it is not meant to be called outside of it. In particular, it
may lead to crashes very difficult to debug when a feature flag is
enabled. The reason is that this module is compiled and reloaded at
runtime, so the module may disappear for a small period of time. Another
reason is that a process calling `rabbit_ff_registry` may hold that
module, preventing the feature flags subsystem from performing its task.

The correct public API to query the state of a feature flag is
`rabbit_feature_flags:is_enabled/1`. It will always return a boolean. If
the feature flag being queried is in `state_changing` state, this
function takes care of waiting for the outcome.

See #5018.
2022-07-28 14:12:41 +02:00
Arnaud Cogoluègnes 5402aab71a
Get committed offset from log state
No need of an extra argument in the sendfile callback.

References #5307
2022-07-26 17:39:55 +02:00
Arnaud Cogoluègnes ecda90de93
Remove a couple of unnecessary ++ 2022-07-26 17:07:34 +02:00
Arnaud Cogoluègnes 95f3fdef19
Use "committed offset" instead of "last committed offset"
References #5307
2022-07-26 11:37:46 +02:00
Arnaud Cogoluègnes 9795727b9f
Include last committed offset in "deliver" stream command
Fixes #5307
2022-07-25 16:15:04 +02:00
Arnaud Cogoluègnes 67aac95f3e
Add exchange command versions command to stream plugin
References #5308
2022-07-25 16:15:03 +02:00
Arnaud Cogoluègnes 37b43fb290
Lower log level on stream creation validation failure
No need to log an error when the arguments are not the same,
info is enough.
2022-07-05 12:18:27 +02:00
Gabriele Santomaggio 3ae0f732d6
Formatting
Signed-off-by: Gabriele Santomaggio <G.santomaggio@gmail.com>
2022-07-01 16:03:22 +02:00
Gabriele Santomaggio a43ca47d95
Fix read frame_max and heartbeat parameters
Signed-off-by: Gabriele Santomaggio <G.santomaggio@gmail.com>
2022-07-01 15:56:41 +02:00
David Ansari b2e821e772 Change normal connection close log level from warning to debug
This line gets logged when the client closes the connection to the
stream port before it authenticates successfully.

Some external load balancers for example connect to the stream port to
do health checks without sending any stream protocol frame.

This commits prevents the RabbitMQ log from being polluted.
2022-06-14 14:12:33 +02:00
Arnaud Cogoluègnes e44b65957d
Limit stream max segment size to 3 GB
Values too large can overflow the stream position field
in the index (32 bit int).
2022-06-10 11:45:57 +02:00
Philip Kuryloski a250a533a4 Remove elixir related -ignore_xref calls
As they are no longer necessary with xref2 and the erlang.mk updates
2022-06-09 23:18:40 +02:00
Karl Nilsson 7e8d08471b Use the leader node when looking up a publisher sequence.
Else the replica may crash or at the very least return the incorrect result.
2022-05-25 15:04:43 +01:00
David Ansari 20677395cd Check queue and exchange existence with ets:member/2
This reduces memory usage and improves code readability.
2022-05-10 10:16:40 +00:00
Arnaud Cogoluègnes 68e8ae8673
Rename function for clarity
References #3753
2022-05-09 10:52:38 +02:00
Arnaud Cogoluègnes 8ccdd7b6f7
Move SAC API to SAC coordinator
As per @ansd's suggestion. This way the stream coordinator
does not have to know about the SAC commands.

References #3753
2022-05-09 10:52:38 +02:00
Arnaud Cogoluègnes f4e2a95e6c
Address code review comments for stream SAC
References #3753
2022-05-09 10:52:37 +02:00
Arnaud Cogoluègnes da8c23d7ce
Adapt clause to send_chunk new return type
References #3753
2022-05-09 10:52:36 +02:00
Arnaud Cogoluègnes 7415abe3b5
Log message if message update response code is not OK
References #3753
2022-05-09 10:52:36 +02:00
Arnaud Cogoluègnes ee79237bf6
Return error if SAC has no name
Single active consumer must have a name, which is used as the reference
for storing offsets and as the name of the group the consumer belongs
to in case the stream is a partition of a super stream.

References #3753
2022-05-09 10:52:36 +02:00
Arnaud Cogoluègnes 4cb814dd1c
Check consumer is actually a SAC on SAC event
And do not do anything if it's not.

References #3753
2022-05-09 10:52:35 +02:00
Arnaud Cogoluègnes adba97fe6e
Do not dispatch on credit request is consumer is inactive
A formerly active consumer can have in-flight credit
requests when it becomes inactive. This commit checks
the state of consumer on credit requests and make sure
not to dispatch messages if it's inactive.
2022-05-09 10:52:35 +02:00
Arnaud Cogoluègnes a1e92b82a0
Add active and activity_status to list_stream_consumers
References #3753
2022-05-09 10:52:33 +02:00
Arnaud Cogoluègnes ad29176723
Add active and activity_status flag to stream consumer metrics
Stream consumers can be active or not with SAC, so these 2 fields
are added to the stream metrics. This is the same as with
regular consumers.

References #3753
2022-05-09 10:52:33 +02:00
Arnaud Cogoluègnes cd22bef50c
Use feature flag public API
References #3753
2022-05-09 10:52:32 +02:00
Arnaud Cogoluègnes 0806fa0b2f
Add stream single active consumer feature flag
Block stream SAC functions if the feature flag is not enabled.

References #3753
2022-05-09 10:52:32 +02:00
Arnaud Cogoluègnes 0623192130
Handle "active" field for stream consumers
In the regular metrics ETS table, not the one from streams.

References #3753
2022-05-09 10:52:31 +02:00
Arnaud Cogoluègnes 5ad9e349e0
Add list_stream_group_consumers CLI command
References #3753
2022-05-09 10:52:31 +02:00
Arnaud Cogoluègnes fb6481c5da
Add ignore_xref entries 2022-05-09 10:52:30 +02:00
Arnaud Cogoluègnes 434d7b5c54
Add colums argument to list_stream_consumer_groups
References #3753
2022-05-09 10:52:30 +02:00
Arnaud Cogoluègnes 1ec49a7aa3
Add list_stream_consumer_groups CLI command
References #3753
2022-05-09 10:52:29 +02:00
Arnaud Cogoluègnes de7f5e1088
Do not emit consumer lag for inactive consumer
References #3753
2022-05-09 10:52:28 +02:00
Arnaud Cogoluègnes eeefd7c860
Monitor connection PIDs in stream SAC coordinator
References #3753
2022-05-09 10:52:27 +02:00
Arnaud Cogoluègnes 4ff2c4a03d
Refactor stream SAC coordinator
Separate logic between single SAC and SAC in a partition of a super
stream (makes code clearer), wait for former active consumer
notification to select the new active consumer (avoids a lock
of some sort on the group, so consumers can come and go).

References #3753
2022-05-09 10:52:25 +02:00
Arnaud Cogoluègnes 037af8c57f
Use RA side effects to notify consumer connections
In SAC coordinator. The messages to connections cannot be sent
from the state machine or they will be sent also during the
replication. Only the leader enforces side effects.

References #3753
2022-05-09 10:52:24 +02:00
Arnaud Cogoluègnes 60529aae84
Move SAC coordinator to stream coordinator
The SAC group coordination should happen as part of raft state
machine, to make it more reliable and robust.
It would also benefit from different "services" the coordinator
and RA provide.

References #3753
2022-05-09 10:52:24 +02:00
Arnaud Cogoluègnes 70598325a9
Fix stream consumer state change
References #3753
2022-05-09 10:52:23 +02:00
Arnaud Cogoluègnes 2fe558ab50
Fix stream SAC computation and notification
Make sure to notify former active when unsubscription
triggers rebalancing.

References #3753
2022-05-09 10:52:23 +02:00
Arnaud Cogoluègnes 11ef789c38
Fix stream SAC computation and notification
Start index from the end because of lists:foldr.
Make sure to notify only the passive newcomer when there's
no change in SAC.

References #3753
2022-05-09 10:52:23 +02:00
Arnaud Cogoluègnes a607f842d1
Implement basic SAC rollover on stream partition
References #3753
2022-05-09 10:52:22 +02:00
Arnaud Cogoluègnes 3a05cf18c5
Remove unused function
References #3753
2022-05-09 10:52:22 +02:00
Arnaud Cogoluègnes 866518469c
Refactor SAC coordinator for SAC in partition context
References #3753
2022-05-09 10:52:22 +02:00
Arnaud Cogoluègnes e91009bb92
Notify stream consumer connection of SAC changes
References #3753
2022-05-09 10:52:21 +02:00
Arnaud Cogoluègnes d98b10f78d
Compute stream partition index
References #3753
2022-05-09 10:52:21 +02:00
Arnaud Cogoluègnes da3f11a03b
Unregister stream SAC consumers on connection closing
References #3753
2022-05-09 10:52:20 +02:00
Arnaud Cogoluègnes c47d83fd8e
Handle SAC (simple) failover
When consumers unsubscribe normally.

References #3753
2022-05-09 10:52:20 +02:00
Arnaud Cogoluègnes d5ae62b1a9
Handle single active consumer registration
WIP. Uses a simple in-memory coordinator for now.
No failover yet.

References #3753
2022-05-09 10:52:20 +02:00
Michael Klishin 37a3448672
Merge pull request #4442 from rabbitmq/quorum-queue-leader-locator
Add quorum queue-leader-locator
2022-04-15 09:31:45 +04:00
David Ansari f32e80c01c Convert random and least-leaders to balanced
Deprecate queue-leader-locator values 'random' and 'least-leaders'.
Both become value 'balanced'.

From now on only queue-leader-locator values 'client-local', and
'balanced' should be set.

'balanced' will place the leader on the node with the least leaders if
there are few queues and will select a random leader if there are many
queues.
This avoid expensive least leaders calculation if there are many queues.

This change also allows us to change the implementation of 'balanced' in
the future. For example 'balanced' could place a leader on a node
depending on resource usage or available node resources.

There is no need to expose implementation details like 'random' or
'least-leaders' as configuration to users.
2022-04-11 10:39:28 +02:00
Philip Kuryloski a22234f6eb Updates for rules_erlang 2.5.0
rabbitmq_cli uses some private rules_erlang apis that have changed in
the upcoming release

Additionally:
- Avoid including both standard and test versions of amqp_client in
integration test suites
- Eliminate most of the compilation order hints (explicit first_srcs)
in the bazel build
- Fix an include statement - in bazel, an app is not available to
itself as a library at compilation time
2022-04-07 14:54:37 +02:00
Arnaud Cogoluègnes 90d110eb20
Return consumer in send_chunks
Not a tuple. The passed-in consumer is updated with the returned
tuple, so we're better off returning the updated consumer record.

References #4368
2022-03-29 16:49:24 +02:00
Arnaud Cogoluègnes aa88fd5b29
Make Osiris listener registration idempotent
Several listeners for the same offset can be registered,
which causes problems for low rate, where consumers hit
the end of the stream often. Then Osiris messages accumulate
in the process mailbox, which slows it down considerably after
some time.

This commit makes the listener registration idempotent for a given
offset.
2022-03-29 11:52:42 +02:00
Michael Klishin c38a3d697d
Bump (c) year 2022-03-21 01:21:56 +04:00
Arnaud Cogoluègnes 043989f4fb
Handle stream coordinator error response for topology request
A stream may not be available or in an inconsistent state and the
stream coordinator reports this with an error that the stream
manager handles like an appropriate response. The stream protocol
adapter then tries to extract the topology from the error.

This commit makes the stream manager handles the error correctly
so that the stream protocol adapter reports the unavailability
of the stream to the client.
2022-02-07 10:28:20 +01:00
Arnaud Cogoluègnes 4dc04926ff
Tolerate unknown fields in stream CLI command requests
This is useful in case new fields are needed in further
versions. A new version node can ask for new fields to an
old version node in a mixed-version cluster.
If the old version node returns a known
value for unknown fields instead of failing, the new
node can set up appropriate default value for these
fields in the result of the CLI commands.
2022-02-02 17:10:56 +01:00
Arnaud Cogoluègnes bd4771addb
Return appropriate response when stream leader not available
Fixes #3874
2021-12-10 10:58:48 +01:00
Arnaud Cogoluègnes b6f5b1c21a
Merge pull request #3842 from rabbitmq/rabbitmq-server-3841-stream-consumer-nested-record
Create configuration nested record in stream consumer record
2021-12-03 11:11:40 +01:00
Arnaud Cogoluègnes 854efb3e85
Declare variable for better formatting
References #3841
2021-12-03 10:59:45 +01:00
Arnaud Cogoluègnes 81b9632535
Create configuration nested record in stream consumer record
Only a couple of fields of the stream consumer record change
very frequently (credits and Osiris log reference), so this commit
introduces a nested record in the main consumer record that
contains the immutable fields. This potentially avoids producing
a lot of garbage, especially when the consumer state contains
several properties (consumer name, or single active consumer information
in the future).

Fixes #3841
2021-12-03 10:34:51 +01:00
Arnaud Cogoluègnes 17d0ba9317
Return "no offset" (19) code when stored offset is undefined
Fixes #3783
2021-12-01 17:59:14 +01:00
Arnaud Cogoluègnes 6aabfdbbb4
Format stream plugin 2021-10-20 14:53:08 +02:00
Michael Klishin 2140562209
Merge pull request #3562 from rabbitmq/disable_stream_plugin_3557
Disable the stream_plugin if stream feature flag is not enabled
2021-10-14 18:45:26 +03:00
Arnaud Cogoluègnes 6d0dc3f6b4
Attach add/delete super stream command to streams scope only
Not to ctl.

References #3503
2021-10-14 17:09:18 +02:00
Michael Klishin 2fbc5fab25
Formatting, wording 2021-10-14 17:20:59 +03:00
Gabriele Santomaggio f2df98aef9 change message 2021-10-13 12:01:21 +02:00
Gabriele Santomaggio 2eb625531d Disable the stream_plugin and stream management if the feature flag
is not enabled.
fixes https://github.com/rabbitmq/rabbitmq-server/issues/3557
2021-10-13 11:59:11 +02:00
Gabriele Santomaggio c270ebf037 Disable the stream_plugin if the feature flag
is not enabled.
fixes https://github.com/rabbitmq/rabbitmq-server/issues/3557
2021-10-12 19:09:34 +02:00
Arnaud Cogoluègnes 6b9589bae4
Handle stream arguments in add_super_stream command
max-age, leader-locator, etc.
2021-10-11 16:50:03 +02:00
Arnaud Cogoluègnes b0bd5f8a00
Add delete_super_stream CLI command 2021-10-11 16:50:02 +02:00
Arnaud Cogoluègnes a73b1a3d0d
Add add_super_stream CLI command 2021-10-11 16:50:01 +02:00
Arnaud Cogoluègnes 147659093f
Add functions to create/delete super stream in manager 2021-10-11 16:50:01 +02:00
Arnaud Cogoluègnes fc80138204
Fallback to rabbit_stream:host/0 if advertised_tls_host not set
The advertised_host must also be tried for TLS connections.

References #3514
2021-09-28 14:00:41 +02:00
Carl Hörberg 52791c677b
Support for advertising different hostname for TLS stream connections
Use case: Allow plain connections over one (internal IP), and TLS
connections over another IP (eg. internet routable IP). Without this
patch a cluster can only support access over one or the other IP, not
both.

(cherry picked from commit b9e6aad035)
2021-09-28 13:47:22 +02:00
Michael Klishin 0f6a9dac27
Introduce rabbit_nodes:all/0 2021-09-20 22:24:25 +03:00
Arnaud Cogoluègnes 5b83dceb87
Return only streams for partition-related commands
The stream partition metadata is based on bindings,
so we make sure to return only streams from the binding
information.
2021-09-15 11:33:04 +02:00
Arnaud Cogoluègnes 04a0653571
Sort stream partitions using binding parameter
If present. To make the partition order stable.
2021-09-14 18:02:22 +02:00
Michael Klishin c8d483809e
Resolve a missed conflict 2021-09-13 20:33:11 +03:00
Michael Klishin f79bc1c935
Merge branch 'master' into stream-reader-close-in-terminate
Conflicts:
	deps/rabbitmq_stream/src/rabbit_stream_reader.erl
2021-09-13 20:27:58 +03:00
Arnaud Cogoluègnes 8f207e3c5f
Make stream protocol route command return several streams
We expect to have 1 stream for each routing key, but
as binding can return several queues for a given key we
let that possibility open in the stream protocol.
2021-09-13 17:53:25 +02:00
Karl Nilsson 9e4506041d fix build warnings 2021-09-13 11:38:41 +01:00
Karl Nilsson 135575b3ff Stream reader: close osiris logs and sockets in terminate
Instead of injecting it into varios places inside the code.

When the osiris log is closed it will decrement the global "readers"
counter which is why it is much safer to do this in terminate.
2021-09-13 11:23:35 +01:00
Karl Nilsson 3b1714cbe3 formatting 2021-09-10 15:26:26 +01:00
Karl Nilsson f10db03b4d Gracefully terminate stream reaader
when the client forcefully terminates TCP connection

Also improve logging.
2021-09-10 15:24:29 +01:00
Karl Nilsson d6301a3e11 Handle closed connections in stream reader
and throw and stop gracefully.
2021-09-10 10:15:59 +01:00
Karl Nilsson 3513fa0ea8 rabbitmq_stream formatting 2021-09-09 09:45:13 +01:00
Karl Nilsson c240ec2985
Fix function_clause error in stream reader
When the server initiate connection close.
2021-08-31 15:29:16 +01:00
Gerhard Lazu dad0025088
Perform stream reader cleanup in terminate
Otherwise metrics will not get cleaned up correctly when processes crash.

It's also tidier to do this in a single place, in terminate/3

Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-31 15:29:15 +01:00
Michael Klishin 2d3f31eb21
Merge pull request #3204 from rabbitmq/keep-state-and-data
Use keep_state_and_data
2021-07-27 21:39:06 +03:00
David Ansari e3ed9c21b0 Fix list_stream_publishers additional usage output 2021-07-22 18:13:07 +02:00
Michael Klishin 0d06f34c66
rabbit_stream_reader: convert most log messages to debug ones 2021-07-21 01:38:13 +03:00
David Ansari 644335de86 Use keep_state_and_data 2021-07-20 16:11:22 +02:00
Michael Klishin e20bca44cc
rabbit_stream_reader: these should not be logged at info level 2021-07-20 00:55:40 +03:00
Michael Klishin 532d076907
Merge pull request #3194 from rabbitmq/stream-reader-state-timeouts
Add stream reader state timeouts
2021-07-19 20:16:04 +03:00
David Ansari 863b899079 Remove TEST macro
since it fails with Bazel.

As discussed with @pjk25, let's set this value via application env,
make it configurable to the test, but not configurable to the user.
2021-07-19 16:42:54 +02:00
David Ansari 4053f729dd Rename STATE_TIMEOUT to CONNECTION_NEGOTIATION_STEP_TIMEOUT 2021-07-15 20:56:56 +02:00
David Ansari 694804d0d2 Add timeout reason to log message 2021-07-15 20:53:46 +02:00
David Ansari 3964da37b4 Close TCP connection when stream reader times out
Add state timeouts.
If the client takes more than 10s for a single step in the authentication
protocol, make the server close the TCP connection.

Also close the TCP connection if the server times out in state
close_sent. That's the case when the client sends an invalid command
(after successful authentication), the server requests the client to
close the connection, but the client doesn't respond anymore.
2021-07-15 19:29:24 +02:00
dcorbacho 9e128b72b4 Set info/2 timeout to infinity to list connections
Default gen_server timeout is not enough to list busy connections.
Setting it to infinity allows the caller to decide the timeout,
as classic queues do. The `emit_info` function's family sets its
own timeout for all the cli commands.
2021-07-14 17:16:22 +02:00
Michael Klishin 29bb9c5b0c
Merge pull request #3175 from processone/proxy_protocol_tls_info
Extract TLS informations that are delivered in PROXY protocol frame
2021-07-13 15:08:40 +03:00
Arnaud Cogoluègnes 8ddff0faf8
Use "store" instead of "commit" for offset tracking 2021-07-08 11:28:33 +02:00
Arnaud Cogoluègnes f9867f1f82
Add uncompressed size field for pub ids generation 2021-07-05 16:22:21 +02:00
Paweł Chmielowski d5daf7598b Extract TLS informations that are delivered in PROXY protocol frame 2021-07-05 13:29:59 +02:00
Arnaud Cogoluègnes be9cc22dc1
Add uncompressed size in stream sub-entry 2021-07-02 16:19:47 +02:00
Gerhard Lazu ef4303a486
Merge pull request #3157 from rabbitmq/stream-protocol-counters
Add specific stream protocol counters to track protocol errors
2021-07-01 17:56:14 +01:00
Arnaud Cogoluègnes f1f733445e
Check publisher still exists on osiris_written event 2021-07-01 10:47:58 +02:00
dcorbacho 58e36b6417 Add specific stream protocol counters to track protocol errors 2021-06-29 12:50:00 +02:00
dcorbacho 228ea40e34
Gauges for global publishers & consumers metrics 2021-06-29 08:10:42 +01:00
David Ansari b145684b1b Remove useless ensure_stats_timer calls
Calling ensure_stats_timer after init_stats_timer and reset_stats_timer
is enough.

The idea is to call stop_stats_timer before hibernation and
ensure_stats_timer on wakeup. However, since we never call
stop_stats_timer in rabbit_stream_reader, we don't need to call
ensure_stats_timer on every network activity.
2021-06-28 11:27:45 +02:00
David Ansari 896d879f8d Fix heartbeater exception exit
Before this commit test AlarmsTest.diskAlarmShouldNotPreventConsumption
of the Java client was failing.
When executing that test, the server failed with:

2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0>     exception exit: {unexpected_message,resume}
2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0>       in function  rabbit_heartbeat:heartbeater/3 (src/rabbit_heartbeat.erl, line 138

because the heartbeater was tried to be resumed without being paused
before.

Above exception exit also happens on master branch when executing this
test. However, the test falsely succeeds on master because the following FIXME was
never implemented:
8e569ad8bf/deps/rabbitmq_stream/src/rabbit_stream_reader.erl (L778)
2021-06-26 14:04:05 +02:00
David Ansari 8c4e2e009d Log at debug level when state machine terminates 2021-06-26 14:02:00 +02:00
David Ansari 81ee05f9ce Convert rabbit_stream_reader into state machine
This is pure refactoring - no functional change.

Benefits:
* code is more maintainable
* smaller methods (instead of previous 350 lines listen_loop_post_auth function)
* well defined state transitions (e.g. useful to enforce authentication protocol)
* we get some gen_statem helper functions for free (e.g. debug utilities)

Useful doc: https://ninenines.eu/docs/en/ranch/2.0/guide/protocols/
2021-06-25 15:07:34 +02:00
David Ansari ff174eaa5f Add behaviour declaration for rabbit_stream_metrics_gc
since it implements a gen_server.
2021-06-25 11:57:14 +02:00
Gerhard Lazu c7971252cd
Global counters per protocol + protocol AND queue_type
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-06-22 14:14:21 +01:00
dcorbacho 38f474688f Stream common library 2021-06-11 17:24:00 +02:00