This includes the global_labels feature introduced in deadtrickster/prometheus.erl#91
To test, run `docker-compose up` in docker dir, then navigate to
localhost:15692/metrics & localhost:3000/dashboards (admin:admin) to see
the Grafana RabbitMQ Overview dashboard.
Add nodes, alarms & partitions to global counts. These are too important
to not show them. Need to discuss how to expose these via metrics.
[#164374397]
Set memory high watermark to 256MiB to force trigger the memory alarm,
as well as ensure messages get paged to disk (forces disk reads).
Make all legends display as table so that values are easier to see when
toggling them.
This produces a bad rabbitmq-server build, perf-test crashes & so do
rabbit_channels. Will build a full rabbitmq-server-generic-unix locally,
this mix & matching is definitely trouble.
publisher-confirms_1 | Main thread caught exception: java.io.IOException
publisher-confirms_1 | 13:07:38.003 [main] ERROR com.rabbitmq.perf.PerfTest - Main thread caught exception
publisher-confirms_1 | java.io.IOException: null
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147)
publisher-confirms_1 | at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:133)
publisher-confirms_1 | at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:182)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:555)
publisher-confirms_1 | at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.createChannel(AutorecoveringConnection.java:165)
publisher-confirms_1 | at com.rabbitmq.perf.MulticastParams$TopologyHandlerSupport.configureQueues(MulticastParams.java:616)
publisher-confirms_1 | at com.rabbitmq.perf.MulticastParams$FixedQueuesTopologyHandler.configureQueuesForClient(MulticastParams.java:699)
publisher-confirms_1 | at com.rabbitmq.perf.MulticastParams.createConsumer(MulticastParams.java:405)
publisher-confirms_1 | at com.rabbitmq.perf.MulticastSet.createConsumers(MulticastSet.java:244)
publisher-confirms_1 | at com.rabbitmq.perf.MulticastSet.run(MulticastSet.java:126)
publisher-confirms_1 | at com.rabbitmq.perf.PerfTest.main(PerfTest.java:276)
publisher-confirms_1 | at com.rabbitmq.perf.PerfTest.main(PerfTest.java:374)
publisher-confirms_1 | Caused by: com.rabbitmq.client.ShutdownSignalException: connection error
publisher-confirms_1 | at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66)
publisher-confirms_1 | at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141)
publisher-confirms_1 | ... 11 common frames omitted
publisher-confirms_1 | Caused by: java.net.SocketException: Connection reset
publisher-confirms_1 | at java.base/java.net.SocketInputStream.read(SocketInputStream.java:186)
publisher-confirms_1 | at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
publisher-confirms_1 | at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
publisher-confirms_1 | at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
publisher-confirms_1 | at java.base/java.io.DataInputStream.readUnsignedByte(DataInputStream.java:293)
publisher-confirms_1 | at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:91)
publisher-confirms_1 | at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:164)
publisher-confirms_1 | at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:598)
publisher-confirms_1 | at java.base/java.lang.Thread.run(Thread.java:834)
rabbitmq1_1 | 2019-04-25 12:40:53.778 [info] <0.1215.0> accepting AMQP connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672)
rabbitmq1_1 | 2019-04-25 12:40:53.840 [info] <0.1215.0> Connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672) has a client-provided name: perf-test-test
rabbitmq1_1 | 2019-04-25 12:40:53.849 [info] <0.1215.0> connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672 - perf-test-test): user 'guest' authenticated and granted access to vhost '/'
rabbitmq1_1 | 2019-04-25 12:40:53.855 [info] <0.1215.0> closing AMQP connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672 - perf-test-test, vhost: '/', user: 'guest')
rabbitmq1_1 | 2019-04-25 12:40:53.860 [info] <0.1224.0> accepting AMQP connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672)
rabbitmq1_1 | 2019-04-25 12:40:53.862 [info] <0.1224.0> Connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672) has a client-provided name: perf-test-configuration
rabbitmq1_1 | 2019-04-25 12:40:53.864 [info] <0.1224.0> connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672 - perf-test-configuration): user 'guest' authenticated and granted access to vhost '/'
rabbitmq1_1 | 2019-04-25 12:40:53.877 [info] <0.1231.0> accepting AMQP connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672)
rabbitmq1_1 | 2019-04-25 12:40:53.880 [info] <0.1231.0> Connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672) has a client-provided name: perf-test-consumer-0
rabbitmq1_1 | 2019-04-25 12:40:53.882 [info] <0.1231.0> connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672 - perf-test-consumer-0): user 'guest' authenticated and granted access to vhost '/'
rabbitmq1_1 | 2019-04-25 12:40:53.890 [error] <0.1239.0> CRASH REPORT Process <0.1239.0> with 0 neighbours exited with reason: no match of right hand value undefined in rabbit_channel:init_queue_cleanup_timer/1 line 2604 in gen_server2:init_it/6 line 597
rabbitmq1_1 | 2019-04-25 12:40:53.891 [error] <0.1231.0> CRASH REPORT Process <0.1231.0> with 0 neighbours crashed with reason: no match of right hand value {error,{'EXIT',{{badmatch,{error,{{{badmatch,undefined},[{rabbit_channel,init_queue_cleanup_timer,1,[{file,"src/rabbit_channel.erl"},{line,2604}]},{rabbit_channel,init,1,[{file,"src/rabbit_channel.erl"},{line,528}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},{child,undefined,channel,{rabbit_channel,start_link,[1,<0.1231.0>,<0.1237.0>,<0.1231.0>,<<"172.25.0.7:38756 -> 172.25.0.4:5672">>,rabbit_framing_amqp_0_9_1,...]},...}}}},...}}} in rabbit_reader:create_channel/2 line 923
rabbitmq1_1 | 2019-04-25 12:40:53.891 [error] <0.1229.0> Supervisor {<0.1229.0>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.1230.0>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.1231.0> exit with reason no match of right hand value {error,{'EXIT',{{badmatch,{error,{{{badmatch,undefined},[{rabbit_channel,init_queue_cleanup_timer,1,[{file,"src/rabbit_channel.erl"},{line,2604}]},{rabbit_channel,init,1,[{file,"src/rabbit_channel.erl"},{line,528}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},{child,undefined,channel,{rabbit_channel,start_link,[1,<0.1231.0>,<0.1237.0>,<0.1231.0>,<<"172.25.0.7:38756 -> 172.25.0.4:5672">>,rabbit_framing_amqp_0_9_1,...]},...}}}},...}}} in rabbit_reader:create_channel/2 line 923 in context child_terminated
rabbitmq1_1 | 2019-04-25 12:40:53.891 [error] <0.1229.0> Supervisor {<0.1229.0>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.1230.0>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.1231.0> exit with reason reached_max_restart_intensity in context shutdown
rabbitmq1_1 | 2019-04-25 12:40:54.376 [warning] <0.1224.0> closing AMQP connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672 - perf-test-configuration, vhost: '/', user: 'guest'):
rabbitmq1_1 | client unexpectedly closed TCP connection
Capture limits in thresholds. Even if they are static and somewhat
specific to this RabbitMQ deployment, it's better to have them when
demo-ing the end-to-end Prometheus/Grafana experience.
[#164374751]
This lights up `Published confirmed / s` Grafana panel.
To light up `Published unroutable / s`, unbind all queues from the
direct exchange.
[#164374751]
This has support for disabling metrics_collector, as captured in
rabbitmq/rabbitmq-management-agent#78 & rabbitmq/rabbitmq-management#691
Since we want management to be enabled, this doesn't help our use-case,
but this option is perfect for users that want metrics, but don't want
to pay the overhead of Management - especially metric aggregations.
[#164376052]
After running `docker-compose up`, open Grafana via
http://localhost:3000 and login with user admin & password admin. After
logging in, you will see a RabbitMQ Overview dashboard pre-loaded (/・0・)
Thanks @cirocosta! https://github.com/cirocosta/sample-grafana
cc @MarcialRosales
[finishes #164374321]
Captures all nodes metrics shown on the Overview page:
* File descriptors
* Socket descriptors
* Erlang processes
* Memory
* Disk
Not displaying any limits since they would make the variations
impossible to see. For example, when file descriptors go for 90 to 30,
if one of the metrics on the graph is 1048576 (Docker image default for
rabbitmq_node_sockets_total), it's impossible to see the metric change
from 90 to 30. The same problem is present in the current RabbitMQ Management
graphs on the node page, under Node statistics.
No thresholds have been set. Threshold values must be defined as
integers in Grafana 6, we can't reference metrics e.g.
rabbitmq_node_sockets_total. Templating the dashboard would be one way,
but the problem with that is keeping it in sync with limits. It's a more
difficult problem than meets the eye, deferring it for now.
Created on Grafana v6.1
[finishes #164374321]
Bumping all prometheus-related deps to latest stable. Defining them in
rabbitmq-components.mk, so that they can be promoted to all deps in
umbrella.
rabbitmq_management_agent is required for alarm-related metrics to be
available.
Added node label to most `rabbitmq_` metrics. I need help adding them to
mfa_totals - metrics_node_label_test test currently fails. The new unit
tests ensure that label/0 behaves as expected in all cases - made
refactoring easy. Run unit tests via:
gmake eunit EUNIT_MODS=prometheus_rabbitmq_core_metrics_collector
Updating to latest erlang.mk makes running eunit tests much faster: 2s
vs 10s. To do this, comment `ERLANG_MK_*` in Makefile and run `gmake
erlank-mk`.