Commit Graph

352 Commits

Author SHA1 Message Date
Karl Nilsson 9e4506041d fix build warnings 2021-09-13 11:38:41 +01:00
Karl Nilsson 135575b3ff Stream reader: close osiris logs and sockets in terminate
Instead of injecting it into varios places inside the code.

When the osiris log is closed it will decrement the global "readers"
counter which is why it is much safer to do this in terminate.
2021-09-13 11:23:35 +01:00
Karl Nilsson 3b1714cbe3 formatting 2021-09-10 15:26:26 +01:00
Karl Nilsson f10db03b4d Gracefully terminate stream reaader
when the client forcefully terminates TCP connection

Also improve logging.
2021-09-10 15:24:29 +01:00
Karl Nilsson d6301a3e11 Handle closed connections in stream reader
and throw and stop gracefully.
2021-09-10 10:15:59 +01:00
Karl Nilsson 3513fa0ea8 rabbitmq_stream formatting 2021-09-09 09:45:13 +01:00
Federico Caprari 2246727428
Fix store offset parameters
As you can see 860333a088/deps/rabbitmq_stream_common/src/rabbit_stream_core.erl (L239)

There is the stream name and not the subscription id in this message.
2021-09-01 22:13:07 +02:00
Arnaud Cogoluègnes 902fa429dd
Use awaitMatch to check global counters
Assertion fails on CI environment.
2021-09-01 09:51:06 +02:00
Karl Nilsson c240ec2985
Fix function_clause error in stream reader
When the server initiate connection close.
2021-08-31 15:29:16 +01:00
Gerhard Lazu 6c0ba03d61
Test that we start from 0 publishers & consumers
Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-31 15:29:16 +01:00
Gerhard Lazu 0ecf3d4eeb
Test stream publisher & consumer counters
Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-31 15:29:16 +01:00
Gerhard Lazu dad0025088
Perform stream reader cleanup in terminate
Otherwise metrics will not get cleaned up correctly when processes crash.

It's also tidier to do this in a single place, in terminate/3

Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-31 15:29:15 +01:00
Arnaud Cogoluègnes 8dc024089c
Bump dependencies in Java stream tests 2021-08-31 08:32:53 +02:00
Michael Klishin ace3ee9cd9
Bump Java stream client used in tests 2021-08-10 11:32:57 +03:00
Michael Klishin 2d3f31eb21
Merge pull request #3204 from rabbitmq/keep-state-and-data
Use keep_state_and_data
2021-07-27 21:39:06 +03:00
David Ansari e3ed9c21b0 Fix list_stream_publishers additional usage output 2021-07-22 18:13:07 +02:00
Karl Nilsson f38c023aa8
Correct Publish frame docs 2021-07-21 09:13:55 +01:00
Michael Klishin 0d06f34c66
rabbit_stream_reader: convert most log messages to debug ones 2021-07-21 01:38:13 +03:00
Gabriele Santomaggio 71c355c963
Fix links 2021-07-20 17:42:56 +02:00
David Ansari 644335de86 Use keep_state_and_data 2021-07-20 16:11:22 +02:00
Gabriele Santomaggio de0f7952e9 Add resources link
to client guide line and streams internals
2021-07-20 09:55:25 +02:00
Gabriele Santomaggio 84f51d7926 Remove project maturity warning 2021-07-20 09:00:11 +02:00
Michael Klishin e20bca44cc
rabbit_stream_reader: these should not be logged at info level 2021-07-20 00:55:40 +03:00
Michael Klishin 70ebefc0f2
rabbit_stream: ignore escript and sbin 2021-07-19 20:26:16 +03:00
Michael Klishin 532d076907
Merge pull request #3194 from rabbitmq/stream-reader-state-timeouts
Add stream reader state timeouts
2021-07-19 20:16:04 +03:00
David Ansari 863b899079 Remove TEST macro
since it fails with Bazel.

As discussed with @pjk25, let's set this value via application env,
make it configurable to the test, but not configurable to the user.
2021-07-19 16:42:54 +02:00
David Ansari 13b03b8530 Remove unused variable warning 2021-07-19 14:36:22 +02:00
David Ansari 4053f729dd Rename STATE_TIMEOUT to CONNECTION_NEGOTIATION_STEP_TIMEOUT 2021-07-15 20:56:56 +02:00
David Ansari 694804d0d2 Add timeout reason to log message 2021-07-15 20:53:46 +02:00
David Ansari 3964da37b4 Close TCP connection when stream reader times out
Add state timeouts.
If the client takes more than 10s for a single step in the authentication
protocol, make the server close the TCP connection.

Also close the TCP connection if the server times out in state
close_sent. That's the case when the client sends an invalid command
(after successful authentication), the server requests the client to
close the connection, but the client doesn't respond anymore.
2021-07-15 19:29:24 +02:00
dcorbacho 9e128b72b4 Set info/2 timeout to infinity to list connections
Default gen_server timeout is not enough to list busy connections.
Setting it to infinity allows the caller to decide the timeout,
as classic queues do. The `emit_info` function's family sets its
own timeout for all the cli commands.
2021-07-14 17:16:22 +02:00
Michael Klishin 29bb9c5b0c
Merge pull request #3175 from processone/proxy_protocol_tls_info
Extract TLS informations that are delivered in PROXY protocol frame
2021-07-13 15:08:40 +03:00
Philip Kuryloski 860333a088
Merge pull request #3177 from rabbitmq/stream-commit-offset-becomes-store-offset
Use "store" instead of "commit" for offset tracking
2021-07-13 12:11:37 +02:00
Philip Kuryloski 8f9de08de7 Also assert no missing suites for all other deps 2021-07-12 18:05:55 +02:00
Philip Kuryloski 71ae7e7d14
Merge pull request #3186 from rabbitmq/use-bazel-erlang-native-sharding
Use bazel erlang native sharding
2021-07-12 12:36:37 +02:00
Philip Kuryloski 3eac3cf8a8 Remove unused load statements from bazel files 2021-07-12 12:10:26 +02:00
Philip Kuryloski 8421100008 Use bazel-erlang semi-automatic suite sharding 2021-07-09 10:05:16 +02:00
Arnaud Cogoluègnes 8ddff0faf8
Use "store" instead of "commit" for offset tracking 2021-07-08 11:28:33 +02:00
Arnaud Cogoluègnes 7cb2645283
Replace commit with store for offset persistence
In stream protocol. Commit have a strong consistency connotation,
which is not actually enforced by the offset persistence
mechanism.
2021-07-08 10:32:04 +02:00
Arnaud Cogoluègnes f9867f1f82
Add uncompressed size field for pub ids generation 2021-07-05 16:22:21 +02:00
Paweł Chmielowski d5daf7598b Extract TLS informations that are delivered in PROXY protocol frame 2021-07-05 13:29:59 +02:00
Arnaud Cogoluègnes be9cc22dc1
Add uncompressed size in stream sub-entry 2021-07-02 16:19:47 +02:00
Gerhard Lazu ef4303a486
Merge pull request #3157 from rabbitmq/stream-protocol-counters
Add specific stream protocol counters to track protocol errors
2021-07-01 17:56:14 +01:00
Arnaud Cogoluègnes f1f733445e
Check publisher still exists on osiris_written event 2021-07-01 10:47:58 +02:00
dcorbacho b636ad2565 Rename protocol error counters to _total 2021-06-30 12:46:41 +02:00
Philip Kuryloski b807db3fd0 Update rabbitmq_stream deps in bazel
for changes occurring in 58e36b6417
2021-06-29 13:07:06 +02:00
dcorbacho 58e36b6417 Add specific stream protocol counters to track protocol errors 2021-06-29 12:50:00 +02:00
dcorbacho 228ea40e34
Gauges for global publishers & consumers metrics 2021-06-29 08:10:42 +01:00
David Ansari b145684b1b Remove useless ensure_stats_timer calls
Calling ensure_stats_timer after init_stats_timer and reset_stats_timer
is enough.

The idea is to call stop_stats_timer before hibernation and
ensure_stats_timer on wakeup. However, since we never call
stop_stats_timer in rabbit_stream_reader, we don't need to call
ensure_stats_timer on every network activity.
2021-06-28 11:27:45 +02:00
David Ansari 896d879f8d Fix heartbeater exception exit
Before this commit test AlarmsTest.diskAlarmShouldNotPreventConsumption
of the Java client was failing.
When executing that test, the server failed with:

2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0>     exception exit: {unexpected_message,resume}
2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0>       in function  rabbit_heartbeat:heartbeater/3 (src/rabbit_heartbeat.erl, line 138

because the heartbeater was tried to be resumed without being paused
before.

Above exception exit also happens on master branch when executing this
test. However, the test falsely succeeds on master because the following FIXME was
never implemented:
8e569ad8bf/deps/rabbitmq_stream/src/rabbit_stream_reader.erl (L778)
2021-06-26 14:04:05 +02:00
David Ansari 8c4e2e009d Log at debug level when state machine terminates 2021-06-26 14:02:00 +02:00
David Ansari 81ee05f9ce Convert rabbit_stream_reader into state machine
This is pure refactoring - no functional change.

Benefits:
* code is more maintainable
* smaller methods (instead of previous 350 lines listen_loop_post_auth function)
* well defined state transitions (e.g. useful to enforce authentication protocol)
* we get some gen_statem helper functions for free (e.g. debug utilities)

Useful doc: https://ninenines.eu/docs/en/ranch/2.0/guide/protocols/
2021-06-25 15:07:34 +02:00
David Ansari ff174eaa5f Add behaviour declaration for rabbit_stream_metrics_gc
since it implements a gen_server.
2021-06-25 11:57:14 +02:00
Philip Kuryloski a3c97d491f Update additional test skipping for 3.8/3.9 mixed versions 2021-06-25 11:17:46 +02:00
Philip Kuryloski bb75157fc1 Mark deps/rabbitmq_stream:commands_SUITE as flaky 2021-06-24 12:56:20 +02:00
Philip Kuryloski 8c7e7e0656 Revert "Default all `rabbitmq_integration_suite` to flaky in bazel"
This reverts commit 70cb8147b2.
2021-06-23 20:53:14 +02:00
Gerhard Lazu c7971252cd
Global counters per protocol + protocol AND queue_type
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-06-22 14:14:21 +01:00
Philip Kuryloski 70cb8147b2 Default all `rabbitmq_integration_suite` to flaky in bazel
Most tests that can start rabbitmq nodes have some chance of
flaking. Rather than chase individual flakes for now, this commit
changes the default (though it can still be overriden, as is the case
for config_scheme_SUITE in many places, since I have yet to see that
particular suite flake).
2021-06-21 16:10:38 +02:00
Philip Kuryloski eea51a7e3a Mark additional suites as flaky 2021-06-17 13:42:22 +02:00
Philip Kuryloski a3bffb4d18 Fixup bazel compilation for rabbitmq_stream_common 2021-06-14 10:07:49 +02:00
dcorbacho 38f474688f Stream common library 2021-06-11 17:24:00 +02:00
Michael Klishin 0d84dc1e48
Merge pull request #3098 from rabbitmq/unexpected-socket-input
Close stream socket if client doesn't follow authentication protocol
2021-06-11 03:25:45 +03:00
David Ansari 1cca0f1e4c Categorize connection log messages 2021-06-10 19:50:48 +02:00
David Ansari fcc8dbeab6 Close client connections that don't follow authentication protocol
Before this commit sending garbarge data to the server stream port
caused the RabbitMQ node to eat more and more memory.

In this commit, we fix it by expecting the client to go through the
proper authentication sequence. Otherwise, the server closes the socket.

Co-authored-by: Michal Kuratczyk <mkuratczyk@pivotal.io>
2021-06-10 15:44:51 +02:00
Arnaud Cogoluègnes dcd65572a0
Remove correlation ID from commit_offset
In stream protocol. commit_offset is asynchronous and does
not expect a response, so the correlation ID is not required.
2021-06-10 15:21:38 +02:00
Arnaud Cogoluègnes b77fb27af3
Merge pull request #3083 from rabbitmq/raw-reader
Raw reader option
2021-06-08 17:48:20 +02:00
dcorbacho 935f57b608 Chunk selector option in offset reader 2021-06-08 15:17:22 +02:00
Michal Kuratczyk 7407a5a100 Apply policy in rabbit_queue_type 2021-06-07 12:30:33 +02:00
Arnaud Cogoluègnes 7d5c8f402a
Merge pull request #3071 from rabbitmq/subscribe-props
Fix properties binary construction
2021-06-03 18:09:52 +02:00
Philip Kuryloski c3c9b3fc50 Merge branch 'bazel-dialyze' 2021-06-01 10:31:29 +02:00
Philip Kuryloski 30f9a95b9f Add dialyze for remaning tier-1 plugins 2021-06-01 10:19:10 +02:00
Arnaud Cogoluègnes 761af0a7a0
Extract publishing IDs from batch publishing
In stream plugin, to e.g. send publish errors in case the stream
does not exist. Batches were not taken into account.
2021-05-31 15:35:10 +02:00
dcorbacho 65c9dae53f Fix properties binary construction 2021-05-28 15:45:19 +02:00
Karl Nilsson f36751aa6d make rabbit_stream_SUITE more reliable
By having rabbit_stream_core cache it's incoming command internally.
2021-05-27 13:15:55 +01:00
Philip Kuryloski f251815002 Replace rabbitmq_stream test helper with common version
from rabbitmq_ct_helpers

and update default app env for bazel, to match Makefile
2021-05-27 12:26:51 +02:00
Arnaud Cogoluègnes 2ab5cb22ca
Expose TLS info for stream connections (CLI, REST API) 2021-05-27 10:43:33 +02:00
Arnaud Cogoluègnes 5a6dbef372
Return TLS port in stream connection properties 2021-05-26 12:30:54 +02:00
Arnaud Cogoluègnes 69ad6969e6
Add stream.advertised_tls_port setting 2021-05-26 11:08:43 +02:00
Arnaud Cogoluègnes 35ef1e5ade
Merge pull request #3038 from rabbitmq/stream-tls
TLS support for streams
2021-05-25 15:39:40 +02:00
dcorbacho 05bd6dd838 Test multiple chunks 2021-05-25 14:25:38 +02:00
Philip Kuryloski a6f70b8dda Add xref for remaining tier-1 plugins 2021-05-25 11:39:03 +02:00
Arnaud Cogoluègnes b7a2e9a792
Fix comment 2021-05-25 09:53:04 +02:00
Karl Nilsson 4a9d8115f8 rebase fixes
post rebase test fixes

Make socket initialisation more lenient

correct return types

fix

remove commented code
2021-05-24 15:53:10 +01:00
dcorbacho 3fefa8e8d4 Use ssl option when initialising data reader 2021-05-21 17:13:15 +01:00
dcorbacho 8f54150867 Add stream TLS test 2021-05-21 17:10:55 +01:00
dcorbacho b2a7884a45 TLS support for streams 2021-05-21 16:40:57 +01:00
Karl Nilsson 03063f2eed
Merge pull request #3043 from rabbitmq/streams-consumer-lag-metrics
Add consumer offset_lag to rabbitmq-stream CLI command & Management
2021-05-21 16:38:50 +01:00
Gerhard Lazu 080b0771cf
Fix test_gc_consumers test
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-05-21 13:02:48 +01:00
Gerhard Lazu f8b4e1e298
Add consumer offset_lag to rabbitmq-stream CLI command & Management
This is an important metric to keep track of and be aware (maybe even
alert on) when consumers fall behind consuming stream messages. While
they should be able to catch up, if they fall behind too much and the
stream gets truncated, they may miss on messages.

This is something that we want to expose via Prometheus metrics as well,
but we've started closer to the core, CLI & Management.

This should be merged as soon as it passes CI, we shouldn't wait on the
Prometheus changes - they can come later.

Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-05-21 13:02:48 +01:00
Arnaud Cogoluègnes 8f8e08a9a9
Send advertised host and port in open
More secure than in peer properties frame, which is just
at the beginning of the connection opening.
2021-05-21 13:03:38 +02:00
Michal Kuratczyk 6c38b42ad6 Log heartbets at debug; remove newlines
A log line every minute about a successful heartbeat pollutes the logs.
So do empty log lines.
2021-05-20 18:46:15 +02:00
Arnaud Cogoluègnes c30e013d7a
Rename max-segment-size to stream-max-segment-size-bytes 2021-05-20 10:16:19 +02:00
Arnaud Cogoluègnes c42930acb0
Set stream plugin default port to 5552 2021-05-19 15:38:52 +02:00
Karl Nilsson ef52b92390 Make stream consumer arg parsing return error
when receiving unexpected input
2021-05-19 12:00:10 +01:00
Arnaud Cogoluègnes c15805b472
Fix stream protocol open origin
Client, not server.
2021-05-19 12:34:35 +02:00
Arnaud Cogoluègnes 7adac7a71b
Add subscription properties to stream protocol 2021-05-19 12:26:30 +02:00
Arnaud Cogoluègnes 0b73c9337c
Fix some logging statements in stream plugin
Missing arguments in the format.
2021-05-19 11:22:15 +02:00
Arnaud Cogoluègnes d9b7523987
Handle connection closing when dispatching stream messages 2021-05-19 10:05:37 +02:00
Arnaud Cogoluègnes 7145a1a2ad
Trigger event on stream consumer cancellation
To make metrics are cleaned up.
2021-05-18 17:42:33 +02:00
Arnaud Cogoluègnes 194198a450
Add stream consumer properties to list command 2021-05-18 17:16:06 +02:00