rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Gerhard Lazu	8aa8d8aa3d	Remove K8S config related to rabbitmq/rabbitmq-prometheus#24 Planning on redoing it differently Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-02-05 13:42:58 +00:00
Gerhard Lazu	8088a50e13	Merge pull request #28 from rabbitmq/metrics-aggregation Option to aggregate channel, queue and connection metrics	2020-02-04 12:43:49 +00:00
Gerhard Lazu	851a2b974f	Partially replicate the deployment mentioned in #24 https://github.com/rabbitmq/rabbitmq-prometheus/issues/24#issue-543780125 Before we can test the effectiveness of the fix in https://github.com/rabbitmq/rabbitmq-prometheus/issues/26 against an environment replica that this was initially reported in, we are missing the load app deployment that would generate all the connections and queues. It would be helpful to know whether https://github.com/coreos/kube-prometheus was used for the Prometheus & Grafana deployment. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-17 16:04:14 +00:00
Gerhard Lazu	f632014e2c	Bump RabbitMQ to latest dev & OTP to latest stable Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-15 19:02:33 +00:00
Gerhard Lazu	e819b6c211	Make rabbitmq_prometheus compile Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-15 17:58:26 +00:00
Gerhard Lazu	29c5d2e241	Fix QQ PerfTest instance name in Prometheus config Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-15 12:58:54 +00:00
dcorbacho	06186065b4	Option to aggregate channel, queue and connection metrics `prometheus.enable_metric_aggregation = true` rabbitmq-prometheus#26	2020-01-10 16:35:50 +01:00
Gerhard Lazu	36e7466893	Use $(JQ) instead of jq to reference the cli Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-07 16:21:25 +00:00
Gerhard Lazu	89efb964d9	Convert raft_entry_commit_latency to seconds & be explicit about unit This is a follow-up to https://github.com/rabbitmq/ra/pull/160 Had to introduce mf_convert/3 so that METRICS_REQUIRING_CONVERSIONS proplist does not clash with METRICS_RAW proplists that have the same number of elements. This is begging to be refactored, but I know that @dcorbacho is working on https://github.com/rabbitmq/rabbitmq-prometheus/issues/26 Also modified the RabbitMQ-Quorum-Queues-Raft dashboard Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-07 16:20:59 +00:00
Gerhard Lazu	5602a9eb4c	Update Docker image to latest dev Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2020-01-07 16:11:57 +00:00
Michael Klishin	f654ab6495	(c) bump	2019-12-29 05:50:34 +03:00
Gerhard Lazu	1e96189826	Bump grafana version to latest stable since flant-statusmap-panel v0.2.0 Thanks @diafour & @briangann for grafana/grafana-plugin-repository#531 👍 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2019-12-20 10:09:04 +00:00
Gerhard Lazu	0af70418b9	Bump OTP to latest stable & RabbitMQ to latest dev in Dockerfile Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>	2019-12-20 10:07:58 +00:00
Jean-Sébastien Pédron	79923f6b00	Git: Ignore copied CLI	2019-12-12 15:03:37 +01:00
Jean-Sébastien Pédron	02e319a97e	Update rabbitmq-components.mk	2019-12-12 13:14:57 +01:00
Gerhard Lazu	a10c7ce6f1	Move the __inputs partial out of the Grafan dashboards dir Grafan will keep failing with the following error message otherwise: failed to load dashboard from /dashboards/__inputs.json Dashboard title cannot be empty	2019-12-04 21:12:26 +00:00
Gerhard Lazu	400ebdf9f8	Publish Erlang-Distribution Grafana dashboard to grafana.com https://grafana.com/grafana/dashboards/11352 [finishes #166355345]	2019-12-04 21:12:16 +00:00
Gerhard Lazu	b51288cfc5	Drop Visualise in Erlang-Memory-Allocators dashboard description [#169264435]	2019-12-04 12:15:35 +00:00
Gerhard Lazu	ad450779ba	Publish Erlang-Memory-Allocators Grafana dashboard to grafana.com https://grafana.com/grafana/dashboards/11350 [finishes #169264435]	2019-12-04 12:11:41 +00:00
Gerhard Lazu	076c65becb	Version control descriptions for all our grafana.com dashboards	2019-12-03 13:34:43 +00:00
Gerhard Lazu	35525db9df	Publish RabbitMQ-Quorum-Queues-Raft Grafana dashboard to grafana.com https://grafana.com/grafana/dashboards/11340 [finishes #166926415]	2019-12-03 13:33:31 +00:00
Gerhard Lazu	4df7e701ee	Decrease load on qq deployment It still puts a significant load on the host, but any lower and we won't see any change in the Uncommited log entries graph, and too little variation in the Log entry commit latency.	2019-12-03 11:28:01 +00:00
Gerhard Lazu	b8893afcde	Add auto-generated test rabbitmq_management.schema	2019-12-03 11:27:26 +00:00
Gerhard Lazu	79284d0b02	Bump RabbitMQ to latest alpha	2019-12-03 11:26:40 +00:00
Gerhard Lazu	d6f7c8b884	Update Prometheus & Grafana to latest stable Well, almost. flat-statusmap-panel v0.1.1 breaks on Grafana v6.5.0. Since it's already been mentioned in https://github.com/flant/grafana-statusmap/issues/76 for a different reason, let's wait until it this is addressed.	2019-11-26 21:44:55 +00:00
Gerhard Lazu	1723ac4357	Bump RabbitMQ to latest dev version + master Makes it easy to test https://github.com/rabbitmq/rabbitmq-prometheus/issues/19	2019-11-26 21:44:10 +00:00
Gerhard Lazu	d827e90a1c	Use pgpkeys.uk instead of pgpkeys.eu It's currently the more stable key server	2019-11-26 21:43:30 +00:00
Gerhard Lazu	cb8232f7cd	If one of the OpenSSL GPG keys does not load, do not fail Most keys load fine, but if one doesn't, everything fails. The package will still verify OK even if we have just a subset of keys installed, be more permissive...	2019-11-26 21:41:57 +00:00
Gerhard Lazu	f11953822f	Update OTP to v22.1.8	2019-11-26 21:41:47 +00:00
Gerhard Lazu	c623a7b8ab	Update OpenSSL to latest 1.1.1 release	2019-11-26 21:41:23 +00:00
Gerhard Lazu	d37e722d33	Use a more compact h target view Now: autocomplete ac \| Configure shell for autocompletion - eval "$(gmake autocomplete)" clean-docker cd \| Clean all Docker containers & volumes cto cto \| Interact with all containers via a top-like utility dist-tls dt \| Make Erlang-Distribution panels come alive - HIGH LOAD docker-image di \| Build & push Docker image to Docker Hub docker-image-build dib \| Build Docker image locally - make tests docker-image-bump diu \| Bump Docker image version across all docker-compose-* files docker-image-push dip \| Push local Docker image to Docker Hub docker-image-run dir \| Run container with local Docker image dockerhub-login dl \| Login to Docker Hub as pivotalrabbitmq down d \| Stop all containers find-latest-otp flo \| Find latest OTP version archive + sha1 metrics m \| Run all metrics containers overview o \| Make RabbitMQ Overview panels come alive preview-readme pre \| Preview README & live reload on edit qq \| Make RabbitMQ-Quorum-Queues-Raft panels come alive - HIGH LOAD Before: ------------------------------------------------------------------------------------------------- autocomplete ac \| Configure shell for autocompletion - eval "$(gmake autocomplete)" ------------------------------------------------------------------------------------------------- clean-docker cd \| Clean all Docker containers & volumes ------------------------------------------------------------------------------------------------- cto cto \| Interact with all containers via a top-like utility ------------------------------------------------------------------------------------------------- dockerhub-login dl \| Login to Docker Hub as pivotalrabbitmq ------------------------------------------------------------------------------------------------- docker-image di \| Build & push Docker image to Docker Hub ------------------------------------------------------------------------------------------------- docker-image-build dib \| Build Docker image locally - make tests ------------------------------------------------------------------------------------------------- docker-image-bump diu \| Bump Docker image version across all docker-compose-* files ------------------------------------------------------------------------------------------------- docker-image-push dip \| Push local Docker image to Docker Hub ------------------------------------------------------------------------------------------------- docker-image-run dir \| Run container with local Docker image ------------------------------------------------------------------------------------------------- down d \| Stop all containers ------------------------------------------------------------------------------------------------- find-latest-otp flo \| Find latest OTP version archive + sha1 ------------------------------------------------------------------------------------------------- metrics m \| Run all metrics containers ------------------------------------------------------------------------------------------------- overview o \| Make RabbitMQ Overview panels come alive ------------------------------------------------------------------------------------------------- dist-tls dt \| Make Erlang-Distribution panels come alive - HIGH LOAD ------------------------------------------------------------------------------------------------- qq \| Make RabbitMQ-Quorum-Queues-Raft panels come alive - HIGH LOAD -------------------------------------------------------------------------------------------------	2019-11-26 15:52:17 +00:00
Gerhard Lazu	0efb206656	Update metrics doc with new queue metrics, improve formatting Tabularize FTW: http://vimcasts.org/episodes/aligning-text-with-tabular-vim/ Add new preview-readme make target (pre alias) for quickly previewing md changes locally, using GitHub Markdown styling.	2019-11-26 15:50:24 +00:00
Gerhard Lazu	d9fd9c7eb8	Merge pull request #20 from rabbitmq/fix-queue-metrics-refs Fix queue_metrics references - thanks @michaelklishin !	2019-11-26 15:24:48 +00:00
Michael Klishin	3aed601336	A typo	2019-11-26 12:25:08 +03:00
Michael Klishin	6e17eeb3c5	Update this test to use a consumer in a separate process	2019-11-26 12:23:40 +03:00
Gerhard Lazu	f550aa0706	Fix queue_metrics references Some properties had queue_ appended, while others used messages_ instead of message_. This meant that metrics such as rabbitmq_queue_consumers were not reported correctly, as captured in https://github.com/rabbitmq/rabbitmq-prometheus/issues/9#issuecomment-558233464 The test needs fixing before this can be merged, it's currently failing with: $ make ct-rabbit_prometheus_http t=with_metrics:metrics_test == rabbit_prometheus_http_SUITE == * [with_metrics] rabbit_prometheus_http_SUITE > with_metrics {error, {shutdown, {gen_server,call, [<0.245.0>, {call, {'basic.cancel',<<"amq.ctag-uHUunE5EoozMKYG8Bf6s1Q">>, false}, none,<0.252.0>}, infinity]}}} Closes #19	2019-11-26 07:47:22 +00:00
Marcial Rosales	122a9fdb67	Document all metrics exposed via the prometheus endpoint	2019-11-15 17:21:59 +01:00
Michael Klishin	d14b154df7	Ignore test/config_schema_SUITE_data/schema/ (cherry picked from commit 8791bff7d7f36a03cfa04e2c40764dee121db5fd)	2019-11-11 17:53:53 +03:00
Michael Klishin	f7f6b1e0a5	Update README.md	2019-11-11 15:30:29 +03:00
Gerhard Lazu	f752fa0640	Add make target for import-friendly rabbitmq-exporter_vs_rabbitmq-prometheus dashboard	2019-10-30 13:46:01 +00:00
Gerhard Lazu	098387db8b	Bump to latest RabbitMQ 3.9.x alpha, OTP & PerfTest stable	2019-10-30 13:36:48 +00:00
Michal Kuratczyk	7b073f503d	Make intervals consitent with other dashboards	2019-10-29 15:48:49 +01:00
Gerhard Lazu	d43ff6356b	Compare rabbitmq-exporter to rabbitmq-prometheus	2019-10-29 13:00:53 +00:00
Gerhard Lazu	83a52e3dc6	Finish Erlang-Memory-Allocators dashboard v1 [#169264435]	2019-10-28 15:09:09 +00:00
Gerhard Lazu	57c5f17fd9	Add Erlang-Memory-Allocators Grafana dashboard WIP [#169264435]	2019-10-28 09:55:13 +00:00
Gerhard Lazu	1d64fe9f67	Add RabbitMQ-Overview screenshots published to grafana.com [#167836027]	2019-10-21 15:30:36 +01:00
Gerhard Lazu	33f03d9aa1	Bump Erlang/OTP & RabbitMQ versions to latest	2019-10-21 13:30:04 +01:00
Gerhard Lazu	98e0932d25	Bump Prometheus & Grafana versions to latest stable	2019-10-21 11:50:44 +01:00
Gerhard Lazu	5f99991063	Rename RabbitMQ-Raft to RabbitMQ-Quorum-Queues-Raft It captures the Quorum-Queues Raft, so let's be specific, especially since we know that there will be other Raft implementations in RabbitMQ, not just Quorum Queues. [#166926415]	2019-10-21 11:49:26 +01:00
Gerhard Lazu	73908bdf7a	Display RabbitMQ & Erlang/OTP version on RabbitMQ-Overview dashboard It is essential to know which RabbitMQ & Erlang/OTP version the cluster is running, as well as how many nodes there are in the cluster. We now have a table which lists this information, right under all singlestat panels. The singlestat panels have been re-organized to make room for 2 new ones: Nodes & Publishers. Classic & Quorum Queues would be great to have, as would VHosts. The last singlestats that I would add are Alarms & Partitions. This would bring the total number of singlestat panels to 14 (we currently have 10). While 14 feel overwhelming, it captures all the important information that I believe is worth knowing about any RabbitMQ cluster. All message-related sections now display 2 graph panels instead of 3. While 3 panels look good on 27" screens, they don't work as well on 15" screens, which is what the majority will be using. Also the 3rd panel would always be for anti-pattern graphs (e.g. unroutable messages, polling operations, etc.) and would be mostly empty in the majority of cases. Fitting fewer panels per row not only helps focusing and understanding what is being displayed, but it also makes it easier to compare when viewing 2 panels side-by-side, on 27" screens. Nodes & churn sections still have 3 panels, which works well when 1 panel is more important than the others. The compromise that we need to make is between giving enough horizontal space to equally important panels vs making the dashboard page too long. RabbitMQ-Overview has always been a comprehensive dashboard which captures a lot of imformation, it was always tough balancing the important vs the complete. [finishes #167836027]	2019-10-21 11:28:02 +01:00
Gerhard Lazu	9bdc2c61ea	Do not limit max CPU utilisation to 100% Multiple cores can go above this limit. [#168734621]	2019-10-21 10:52:02 +01:00
Gerhard Lazu	a0423f0aa5	Limit memory & disk units to 2 decimals on RabbitMQ Overview 9.313226 GiB is a lot harder to read than 9.31 GiB, and therefore less useful. Observing other people use this made it obvious that limiting the precision was the human-friendly thing to do.	2019-10-17 09:14:54 +01:00
Michael Klishin	789c07fd4e	Accept content type parameters such as text/plain;version=0.0.4 Closes #12. References influxdata/telegraf#6523.	2019-10-16 02:56:21 +03:00
Gerhard Lazu	de8a51c6f3	Allow multi RabbitMQ cluster selection when comparing Erlang Distribution Thanks @acogoluegnes! [#168734621]	2019-10-14 16:34:53 +01:00
Gerhard Lazu	13d603c5f0	Add screenshots used for RabbitMQ-Overview on grafana.com https://grafana.com/grafana/dashboards/10991 [finishes #165818656]	2019-10-14 13:58:21 +01:00
Gerhard Lazu	69679b9e67	Add descriptions to Grafana dashboards This will be used in the Short Description when uploading to grafana.com [#168734621]	2019-10-14 13:11:11 +01:00
Gerhard Lazu	37da59a715	Add screenshots used for Erlang-Distributions-Compare on grafana.com https://grafana.com/grafana/dashboards/10988 [#168734621]	2019-10-14 12:14:48 +01:00
Gerhard Lazu	e214677a2c	Rename vm to host, fix CPU expressions [#168734621]	2019-10-14 11:57:26 +01:00
Gerhard Lazu	fbb075d83c	Finish Erlang-Distributions-Compare GRafana dashboard * explains source of metrics via row names * makes tables slightly wider to mitigate long names line wrapping * do not limit entries in tables, refresh resets table pagination [finishes #168734621]	2019-10-14 10:27:55 +01:00
Gerhard Lazu	9fc0d79238	Make Erlang-Distribution-Compare dashboard 15"-friendly The yardstick for all Grafana dashboards should be 1920 x 1200, the screen format most common in our team. If the dashboards look good on our screens, they will look good on other screens too. Smalle resolutions won't look too crammed, and bigger resolutions can be split in half (e.g. 27" iMacs). Some take-aways from optimising the layout of this dashboard: * limit horizontal graph panels to 3 * limit horizontal panels to 2 if the information is dense (e.g. table + graph) * use the same width for graph panels that need comparing, stack vertically	2019-10-14 09:19:16 +01:00
Gerhard Lazu	a356e2f630	Set all datasources to null, simplify dashboard tags When exporting dashboards, all datasources are set to a dynamic datasource, otherwise use the default local one (prometheus).	2019-10-14 09:18:43 +01:00
Gerhard Lazu	82c47bf352	Fix a couple more inconsistencies in Erlang-Distributions-Compare [#168734621]	2019-10-04 22:23:43 +01:00
Gerhard Lazu	64d87d06fc	Fix Erlang-Distributions-Compare network panels title	2019-10-04 22:11:12 +01:00
Gerhard Lazu	e5fc8b18c8	Fix Erlang-Distributions-Compare title, reset time_options	2019-10-04 22:08:27 +01:00
Gerhard Lazu	13a2d411e2	Compare different Erlang Distributions * tls, deflate, lz4, zstd, etc. [#168734621]	2019-10-04 22:00:17 +01:00
Gerhard Lazu	59f7663486	Add screenshots that have been used for RabbitMQ-PerfTest Available here: https://grafana.com/grafana/dashboards/6566	2019-10-04 21:57:02 +01:00
Gerhard Lazu	6f34ebdb1c	Add hidden file to quickly activate shell autocompletion `. .a<TAB>` and you are in business!	2019-10-04 21:56:22 +01:00
Gerhard Lazu	71692f2dbf	Remove shared __requires & update-dashboards make target __requires differs across dashboards update-dashboards is not as useful anymore, vimdiffing most of the time.	2019-10-04 21:54:14 +01:00
Gerhard Lazu	19cbbbf755	Update tags for all Grafana dashboards	2019-10-03 17:39:19 +01:00
Gerhard Lazu	722ce8bf86	Add RabbitMQ-Perftest Grafana dashboard & wire all PerfTest instances [#168734745]	2019-10-03 17:39:09 +01:00
Gerhard Lazu	402aa4722f	Extract __requires from Grafana dashboards, template all datasources	2019-10-03 17:32:40 +01:00
Gerhard Lazu	59ef1f1fa2	Update OTP, RabbitMQ & PerfTest images to latest versions Once 3.8.0 got released, master became 3.9.x, so we are now tracking RabbitMQ 3.9 dev builds.	2019-10-02 13:11:21 +01:00
Gerhard Lazu	bf4303a652	Add make targets for import-friendly Grafana dashboards To get an import-friendly RabbitMQ Overview dashboard, run the following command: make RabbitMQ-Overview.json On macOS, to send this output to clipboard: make RabbitMQ-Overview.json \| pbcopy This is the preferred alternative to `9aa22e1895` See `dae49b5c08` for more context. cc @mkuratczyk This commit introduces a few other somewhat related changes: * BASH autocompletion for make targets - make ac * descriptions for all custom targets - make h * continuous feedback loops for ac & h targets - make CFac I would really like to see some of the above features be part of erlang.mk. What do you think @essen? Anything in particular that you would like me to PR? @dumbbell, my other Make partner-in-crime, may be interested in discussing the above ; ) == LINKS * https://medium.com/@lavieenroux20/how-to-win-friends-influence-people-and-autocomplete-makefile-targets-e6cd228d856d * https://github.com/Bash-it/bash-it/blob/master/completion/available/makefile.completion.bash	2019-10-02 13:07:17 +01:00
Gerhard Lazu	dae49b5c08	Extract __inputs from Grafana dashboards While __inputs are required for the dashboards to work in environments where Prometheus is not the default datasource, it breaks the local development flow. In other words, `9aa22e1895` prevents `make metrics overview` from working as designed. We are going to add shortly a simple way of converting the local dashboards into a format that can be imported in Grafana and will work when Prometheus is not the default datasource (e.g. when using https://github.com/coreos/kube-prometheus) Long-term, these dashboards will be available via grafana.com, which is the preferred way of consuming them. cc @mkuratczyk	2019-10-02 12:51:33 +01:00
Gerhard Lazu	f7866a1908	Add option to disable stats in management to overview example Commented by default - we don't want to shock people just yet.	2019-09-26 17:35:06 +01:00
Gerhard Lazu	c2aef07678	Bump OTP to latest stable & RabbitMQ to latest 3.8 dev	2019-09-26 17:33:39 +01:00
Michael Klishin	02a09c6a78	Update README.md	2019-09-26 13:53:07 +03:00
Michael Klishin	b70a8da7f0	Expose endpoint path configuration, references #8	2019-09-26 13:39:21 +03:00
Michael Klishin	b03dfa2dd2	New style configuration schema for listeners Closes #8.	2019-09-26 13:08:36 +03:00
Michael Klishin	15b8e8cf4d	Update README.md	2019-09-25 07:37:42 +03:00
Michal Kuratczyk	9aa22e1895	Make the datasource configurable for all dashboards	2019-09-24 15:40:18 +02:00
kjnilsson	3980e1495d	Update rabbitmq-components.mk	2019-09-13 10:24:28 +01:00
Gerhard Lazu	7f049b9e05	Finish adding descriptions to all RabbitMQ-Overview panels Thanks @michaelklishin! [#167542609]	2019-09-06 18:52:45 +01:00
Gerhard Lazu	477d32e3cb	Review incoming messages nodes panels with @michaelklishin [#167542609]	2019-09-05 17:53:38 +01:00
Gerhard Lazu	235297eb6b	Review nodes panels description with @michaelklishin [#167542609]	2019-09-05 17:09:44 +01:00
Gerhard Lazu	c66959ec4a	Start adding missing descriptions to RabbitMQ-Overview panels [#167542609]	2019-09-04 18:31:57 +01:00
Gerhard Lazu	8eaa90d42b	Merge pull request #6 from rabbitmq/revise-metric-types-and-naming Use the correct metric types & capture perspective when naming	2019-09-04 15:38:03 +01:00
Gerhard Lazu	7e9013e0ad	Update Grafana dashboards to account for metric name changes [#167846096]	2019-09-04 15:32:52 +01:00
Gerhard Lazu	2b73981ab1	Fix build & identity info metrics Improve pattern matching used in tests so that we don't match partial metric names. [#167846096]	2019-09-04 13:21:50 +01:00
Gerhard Lazu	36dece92f9	Clarify the type of sockets we report metrics for [#167846096]	2019-09-04 12:53:20 +01:00
Gerhard Lazu	aaedcef4a5	Bump RabbitMQ & PerfTest Docker image versions [#167846096]	2019-09-04 12:52:46 +01:00
Gerhard Lazu	5781130b61	Use the correct metric types & capture perspective when naming Some metrics were of type gauge while they should have been of type counter. Thanks @brian-brazil for making the distinction clear. This is now captured as a comment above the metric definitions. Because all metrics are from RabbitMQ's perspective, cached for up to 5 seconds by default (configurable), we prepend `rabbitmq_` to all metrics emitted by this collector. While Some metrics are for Erlang (erlang_), Mnesia (schema_db_) or the System (io_), they are all observed & cached by RabbitMQ, hence the prefix. This is the last PR which started in the context of prometheus/docs#1414 [#167846096]	2019-09-04 11:49:48 +01:00
kjnilsson	c04823914c	Update rabbitmq-components.mk	2019-09-04 10:31:07 +01:00
Gerhard Lazu	aafc4c026b	Revert erlang_uptime_seconds to gauge, not counter We care about its value rather than the rate of change. [#167846096]	2019-09-03 19:56:59 +01:00
Gerhard Lazu	fbc945f710	Convert all time metrics to seconds This started in the context of prometheus/docs#1414, specifically https://github.com/prometheus/docs/pull/1414#issuecomment-524250746 [#167846096]	2019-09-03 17:17:50 +01:00
Gerhard Lazu	fcd488cc47	Merge pull request #4 from rabbitmq/review-global-labels-with-brian-brazil Replace global labels with build_info & identity_info metrics	2019-09-03 15:54:51 +01:00
Gerhard Lazu	98e488f1c4	Use standard naming for metrics expected from the client library As described in https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics. Until prometheus.erl has the prometheus_process_collector functionality built-in - this may not happen -, we are exposing a subset of those metrics via rabbitmq_core_metrics_collector, so we are going to stick to the expected naming conventions. This commit supercedes the thought process captured in `1e5f4de4cb` [#167846096]	2019-09-03 15:31:55 +01:00
Gerhard Lazu	1e5f4de4cb	Rename process-related metrics to stay closer to conventions While `process_open_fds` would have been ideal, because the value is cached within RabbitMQ, and computed differently across platforms, it is important to keep the distinction from, say, what the kernel reports just-in-time. I am also capturing the Erlang context by adding `erlang_` to the relevant metrics. The full context is: RabbitMQ observed this Erlang VM process metric to be X, so this is why some metrics are prefixed with `rabbitmq_erlang_process_` Because there is a difference betwen what RabbitMQ limits are set to, e.g. `rabbitmq_memory_used_limit_bytes`, vs. what RabbitMQ reports about the Erlang process, e.g. `rabbitmq_erlang_process_memory_used_bytes`. This is the best that we can do while staying honest about what is being reported. cc @brian-brazil [#167846096]	2019-09-03 12:30:48 +01:00
Gerhard Lazu	b3336da844	Finish updating Erlang-Distribution dashboard to use new info metric [#167846096]	2019-09-03 10:48:17 +01:00
Gerhard Lazu	dbe8f331bc	Bump Grafana Docker image to latest stable	2019-09-02 22:41:11 +01:00
Gerhard Lazu	6639f5f68f	Start updating Erlang-Distribution dashboard to use new info metric [#167846096]	2019-09-02 22:40:24 +01:00
Gerhard Lazu	48ee9875e7	Finish updating RabbitMQ-Raft dashboard to use the new info metric [#167846096]	2019-09-02 21:37:26 +01:00
Gerhard Lazu	eeaf49b2d5	Start updating RabbitMQ-Raft dashboard to use new info metric [#167846096]	2019-09-02 18:15:21 +01:00
Gerhard Lazu	475d7f0810	Clarify connections & channels help	2019-09-02 17:04:49 +01:00
Jean-Sébastien Pédron	21e0be3325	Update erlang.mk	2019-08-29 20:45:42 +02:00
Gerhard Lazu	4df949fe32	Finish updating RabbitMQ-Overview dashboard to use the new info metric [#167846096]	2019-08-27 18:43:52 +01:00
Gerhard Lazu	ecd0581514	Bump Prometheus & Grafana Docker images to latest stable	2019-08-27 18:43:34 +01:00
Gerhard Lazu	2e686f1131	Continue updating RabbitMQ-Overview dashboard to use the new info metric [#167846096]	2019-08-27 17:11:41 +01:00
Gerhard Lazu	f607dd6b44	Merge branch 'master' into review-global-labels-with-brian-brazil	2019-08-26 15:29:13 +03:00
Gerhard Lazu	cff15caa8c	Remove redundant prometheus.yml config We are using the one in the docker dir [#167846096]	2019-08-23 17:25:01 +01:00
Gerhard Lazu	9a4cba7c50	Start using the new rabbitmq_identity_info metric to filter by cluster [#167846096]	2019-08-15 21:24:01 +01:00
Gerhard Lazu	e2be7193ff	Use a higher config_port when testing Otherwise it will clash with docker-compose-overview.yml ports	2019-08-15 16:40:19 +01:00
Gerhard Lazu	da20bc4ce0	Match build_info & identity metric type to the exported value [#167846096]	2019-08-15 16:23:13 +01:00
Gerhard Lazu	052d92c74b	Replace global labels with build_info & identity_info metrics This started in the context of prometheus/docs#1414, specifically https://github.com/prometheus/docs/pull/1414#issuecomment-520505757 Rather than labelling all metrics with the same label, we are introducing 2 new metrics: rabbitmq_build_info & rabbitmq_identity_info. I suspect that we may want to revert deadtrickster/prometheus.erl#91 when we agree that the proposed alternative is better. We are yet to see through changes in Grafana dashboards. I am most interested in how the updated queries will look like and, more importantly, if we will have the same panels as we do now. More commits to follow shortly, wanted to get this out the door first. In summary, this commit changes: # TYPE erlang_mnesia_held_locks gauge # HELP erlang_mnesia_held_locks Number of held locks. erlang_mnesia_held_locks{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0 # TYPE erlang_mnesia_lock_queue gauge # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock. erlang_mnesia_lock_queue{node="rabbit@920f1e3272af",cluster="rabbit@920f1e3272af",rabbitmq_version="3.8.0-alpha.806",erlang_version="22.0.7"} 0 ... To this: # TYPE erlang_mnesia_held_locks gauge # HELP erlang_mnesia_held_locks Number of held locks. erlang_mnesia_held_locks 0 # TYPE erlang_mnesia_lock_queue gauge # HELP erlang_mnesia_lock_queue Number of transactions waiting for a lock. erlang_mnesia_lock_queue 0 ... # TYPE rabbitmq_build_info untyped # HELP rabbitmq_build_info RabbitMQ & Erlang/OTP version info rabbitmq_build_info{rabbitmq_version="3.8.0-alpha.809",prometheus_plugin_version="3.8.0-alpha.809-2019.08.15",prometheus_client_version="4.4.0",erlang_version="22.0.7"} 1 # TYPE rabbitmq_identity_info untyped # HELP rabbitmq_identity_info Node & cluster identity info rabbitmq_identity_info{node="rabbit@bc7aeb0c2564",cluster="rabbit@bc7aeb0c2564"} 1 ... [#167846096]	2019-08-15 16:00:29 +01:00
Gerhard Lazu	4aa3871194	Use different names for *_process_reductions_total metrics It is invalid to have multiple metrics with the same name, TYPE & HELP, but differing labels. [#167846096]	2019-08-14 16:17:48 +01:00
Gerhard Lazu	75ecd6af1d	Fix test that fails when the metric is empty	2019-08-14 12:44:29 +01:00
Gerhard Lazu	e218ea5ea2	Reorder elements in metric names & improve naming bytes / packets must come before _total Explaing element order difference in TOTALS vs the metrics above [#167846096]	2019-08-13 19:15:12 +01:00
Gerhard Lazu	3d741bb1cd	Use past-tense in metrics that capture what happened	2019-08-13 17:44:47 +01:00
Gerhard Lazu	d5c83792bc	Increase Prometheus scrape to 15s & match across all metrics We want to use a consistent range for all metrics that use rate() and a safe value (4x the Prometheus scrape interval): https://www.robustperception.io/what-range-should-i-use-with-rate This also prompted a change in RabbitMQ's default collect_statistics_interval, so that we don't update metrics unnecessarily. We are OK if the Management UI doesn't update on every 5s auto-refresh. Related `a929f22233` [#167846096]	2019-08-13 17:20:49 +01:00
Gerhard Lazu	3da98e74dc	Review metric naming with @brian-brazil Started as a Prometheus docs discussion in prometheus/docs#1414, mostly based on https://prometheus.io/docs/instrumenting/writing_exporters/ Raft metrics are of type gauge, not counter. _If you care about the absolute value rather than only how fast it's increasing, that's a gauge_ All node_persister_metrics are now counters - some were gauges before. They are now named using metric naming best practices: https://prometheus.io/docs/practices/naming/ All metrics names that should have units, do. Some use microseconds, others milliseconds and others bytes or ops (operations). We don't do any unit conversion in the collector but simply expose the units that are used when the metric value is written to ETS. While some metrics such as io_sync_time_microseconds_total would be better expressed as Sumarries, the refactoring required to achieve that is not worth the effort. Will keep things simple & imperfect for now, especially since we don't have a dashboard that helps visualise these metrics. The next step is to address global labels - will submit as a separate PR. [#167846096]	2019-08-13 15:57:24 +01:00
Gerhard Lazu	b297d7b9eb	Set cluster name via config Now that there is a 3.8 alpha build that includes rabbitmq/rabbitmq-server#2075, let's make use of it! Without this, when a new cluster was started, some nodes ended up wtih `rabbit@localhost` for the cluster label, instead of e.g. `rmq-gcp-38`. The main suspect was a race condition, where the rabbitmq_prometheus app starts before the cluster name is set via `rabbitmqctl set_cluster_name`. [finishes #167835770]	2019-08-13 15:50:01 +01:00
Michael Klishin	e3c4f62a88	Update rabbitmq-components.mk	2019-08-11 01:48:03 +10:00
Gerhard Lazu	62dc0543cc	Update RabbitMQ version to latest alpha	2019-08-08 18:38:56 +01:00
Gerhard Lazu	a9067b8275	Merge pull request #2 from rabbitmq/version-label Label all metrics with Erlang and RabbitMQ version	2019-08-08 17:00:42 +01:00
Gerhard Lazu	f1043134f4	Fix test failure message description	2019-08-08 16:58:01 +01:00
Gerhard Lazu	57b6092348	Remove duplicate filter Thanks @mkuratczyk for spotting it!	2019-08-05 18:29:16 +01:00
Diana Corbacho	82d858719d	Label all metrics with Erlang and RMQ version [#166413229]	2019-08-05 09:40:51 +01:00
Jean-Sébastien Pédron	3677785d62	Update erlang.mk	2019-08-02 09:54:31 +02:00
Michael Klishin	e8b1946e22	Update rabbitmq-components.mk	2019-08-01 17:13:25 +03:00
Diana Corbacho	ea1bc4fff5	Add prometheus protocol type to listener options - listed in status command [#166929073]	2019-08-01 11:09:58 +01:00
Gerhard Lazu	4f54a512d3	Link to Monitoring with Prometheus & Grafana guide to RabbitMQ-Overview	2019-07-26 18:25:17 +01:00
Gerhard Lazu	f08914dfc8	Update Nodes panels descriptions on RabbitMQ-Overview Thanks @acoguelegnes for pointing out the broken link! [#165818779]	2019-07-26 18:21:12 +01:00
Gerhard Lazu	ab88238c09	Review RabbitMQ-Raft Dashboard with @michaelklishin	2019-07-26 17:02:01 +01:00
Gerhard Lazu	f252a57bc4	Update metrics images to latest	2019-07-23 09:57:59 +01:00
Gerhard Lazu	9970059bc2	Bump RabbitMQ & PerfTest images to latest PerfTest image includes rabbitmq/rabbitmq-perf-test#212	2019-07-22 13:59:49 +01:00
Gerhard Lazu	c2a49d1513	Build on top of latest published RabbitMQ alpha version	2019-07-17 14:51:16 +01:00
Arnaud Cogoluègnes	e0ee834eb9	Update rabbitmq-components.mk	2019-07-09 16:06:06 +02:00
Jean-Sébastien Pédron	642a4b47de	erlang.mk: Remove the `protobuffs` plugin It breaks of the build of the `prometheus` dependency: gmake[1]: Entering directory '.../prometheus' DEPEND prometheus.d PROTO prometheus_model.proto {"init terminating in do_boot",{undef,[{protobuffs_compile,generate_source,["src/model/prometheus_model.proto",[{output_include_dir,"./include"},{output_src_dir,"./src"}]],[]},... init terminating in do_boot ({undef,[{protobuffs_compile,generate_source,[[_],[_]],[]},... gmake[2]: * [.../rabbitmq_prometheus/erlang.mk:6874: prometheus.d] Error 1 gmake[1]: * [.../rabbitmq_prometheus/erlang.mk:5105: app] Error 2 gmake[1]: Leaving directory '.../prometheus' gmake: *** [erlang.mk:4431: deps] Error 2	2019-07-01 12:51:08 +02:00
Jean-Sébastien Pédron	e04dac087b	Update rabbitmq-components.mk	2019-06-28 16:05:38 +02:00
Jean-Sébastien Pédron	c2087b3125	Update erlang.mk	2019-06-28 16:02:37 +02:00
Gerhard Lazu	4dec886e52	Add greedy-consumer to simulate unhealthy unack'ed messages Re-order apps to nodes so that graphs show in a certain colour that matches everything else. Little details like these matter. [#165818779]	2019-06-27 19:26:05 +01:00
Gerhard Lazu	cd99957d2d	Do not abbreviate values for fds & sockets Draw nulls as zeroes, otherwise the msgs unack'ed graph will look choppy	2019-06-27 19:24:50 +01:00
Gerhard Lazu	c4dfdc6471	Do not allow entry commit latency to display below 0 on RabbitMQ Raft	2019-06-26 15:21:16 +01:00
Gerhard Lazu	825a5ca49b	Fix messages redelivered critical threshold on RabbitMQ Overview	2019-06-26 15:16:54 +01:00
Gerhard Lazu	f1293fd6c8	Make it easy to spin up various containers, e.g. metrics qq	2019-06-26 11:44:52 +01:00
Gerhard Lazu	30eab6b0ac	Build Docker image with latest alpha & make it easier next time	2019-06-26 11:35:17 +01:00
Gerhard Lazu	805dd5e3b2	Enable quorum queue feature flag when runnin e2e metrics tests Otherwise tests will fail in CI: https://ci.rabbitmq.com/teams/main/pipelines/server-release:v3.8.x-mixed-versions/jobs/test-rabbitmq-prometheus/builds/20 Remove unused rabbit_ct_client_helpers imports	2019-06-26 11:03:28 +01:00
Gerhard Lazu	201365278d	Update Docker image to latest	2019-06-25 11:47:57 +01:00
Gerhard Lazu	8502c9208d	Use rabbit master, no need to copy this specific file into the image [#166819045]	2019-06-25 11:09:23 +01:00
Gerhard Lazu	d56ab528a7	Explain Raft panels & add link to QQ docs [finishes #166819045]	2019-06-25 11:07:58 +01:00
Gerhard Lazu	c92e551007	Improve Erlang Dist & Overview dashboards based on recent learnings Learned a couple of new things while building RabbitMQ-Raft, applied them here.	2019-06-24 18:29:20 +01:00
Gerhard Lazu	3cdf507c63	Visualise Ra entry commit latency & members with many log entries Did a couple other improvements to all other panels, feels almost MVP. [#166819045]	2019-06-24 18:28:01 +01:00
Gerhard Lazu	0a5b355ee3	Build Docker image with latest ra metrics tweaks rabbitmq/ra#98 [#166819045]	2019-06-24 18:26:48 +01:00
Gerhard Lazu	d22e97dd52	Enable rabbitmq_top plugin Useful to find out which processes have the most number of reductions [#166819045]	2019-06-24 18:25:31 +01:00
Gerhard Lazu	5e280c0281	Add first version of RabbitMQ Raft metrics Depends on https://github.com/rabbitmq/ra/tree/metrics_tweaks & https://github.com/rabbitmq/rabbitmq-server/tree/qq_metrics_tweak [#166819045]	2019-06-20 20:11:31 +01:00
Gerhard Lazu	31aa440bc4	Bump Erlang to latest stable & RabbitMQ to latest alpha	2019-06-20 20:10:02 +01:00
Gerhard Lazu	c4f28ed7e1	Clarify for which alarms publishers get blocked	2019-06-19 19:15:16 +01:00
Gerhard Lazu	4b78d41055	Improve node naming, standardise the colour pinning regex	2019-06-17 22:20:54 +01:00
Gerhard Lazu	6daccf9b88	Improve node colour pinning * start from 0, not 1 * fix colour pinning for nodes with numbers - e.q. rmq-gcp-38	2019-06-17 19:04:28 +01:00
Gerhard Lazu	d5b1a03648	Increase erlang_vm_dist_node_queue_size threshold to 64MB & expand info [#166037004]	2019-06-17 17:20:04 +01:00
Gerhard Lazu	8a60eef9a3	Fix erlang_vm_dist_node_queue_size graph It's not a rate, it's the actual buffered data [#166037004]	2019-06-17 17:01:27 +01:00
Gerhard Lazu	22e59f3cf4	Use latest RabbitMQ 3.8.0 alpha release	2019-06-13 09:54:41 +01:00
Gerhard Lazu	cf339a49e8	Respond to learnings from a LRE PromStack & Erlang Distribution metrics re deadtrickster/prometheus.erl#94 re erlang/otp#2270 [#166574772]	2019-06-11 19:03:50 +01:00
Gerhard Lazu	b795bd44a4	Add link to Publishers guide from Messages published / s panel	2019-06-10 16:19:08 +01:00
Gerhard Lazu	7339c2efe3	Update to latest RabbitMQ 3.8 alpha	2019-06-10 10:33:17 +01:00
Gerhard Lazu	4ea1e4b29d	Update to latest perf-test Fixes rabbitmq/rabbitmq-perf-test#207	2019-06-10 10:08:16 +01:00
Gerhard Lazu	8f061f31e6	Bump to latest RabbitMQ & PerfTest images This will be helpful to continue rabbitmq/rabbitmq-perf-test#207	2019-06-06 17:31:44 +01:00
Gerhard Lazu	d1fed375b8	Layer on top of latest 3.8.0 alpha	2019-06-06 17:31:14 +01:00
Gerhard Lazu	e3dc0f30d5	Re-sync with the official image Dockerfile A few things have changed since docker-library/rabbitmq/#342, getting this back on-par with the source of truth.	2019-06-06 17:28:50 +01:00
Gerhard Lazu	773b8f8670	Show legends on process states It's hard to understand what the different colours mean otherwise. Also, yellow is preferable to purple when it comes to displaying runnable processes - those stuck in the run queue. cc @michaelklishin	2019-06-03 22:00:43 +01:00
Gerhard Lazu	0945511e7f	Capture learnings from ERL-959 into Erlang Distribution Grafana dashboard It explains the correlation between inet packets & TCP packets, and why the inet packet size varies when TLS is used for inter-node communication. [finishes 166419953]	2019-06-03 18:04:07 +01:00
Gerhard Lazu	90b5653dc1	Improve Memory available panel title & description Thanks @essen!	2019-06-03 18:02:45 +01:00
Gerhard Lazu	ebd4ffc67f	Modify the default distribution link buffer It makes a big difference for stable throughput. See screenshots from https://bugs.erlang.org/browse/ERL-959 We need to test this in a real network - I'm thinking GCP -, outside of Docker. The results will inform whether we should change the default - which is 1436 bytes. [#166419953]	2019-06-03 18:00:26 +01:00
Gerhard Lazu	e7fa4a2753	Bump Erlang/OTP to 22.0.2	2019-06-03 17:59:57 +01:00
Gerhard Lazu	4df7ccf666	Default docker-compose file target to up & tail logs	2019-05-30 15:42:21 +01:00
Gerhard Lazu	a6219d19d3	Add make target to clean all containers & volumes in Docker	2019-05-30 15:31:55 +01:00
Gerhard Lazu	cd0e804180	Use consistent naming for the Erlang Distribution cluster [#165818813]	2019-05-30 15:31:52 +01:00
Gerhard Lazu	112254ed96	Enable filtering Erlang Distribution metrics in Grafana by cluster [finishes #165818813]	2019-05-30 13:58:56 +01:00
Gerhard Lazu	9ddb7e6cd6	Enable filtering RabbitMQ Overview metrics in Grafana by cluster [#165818813]	2019-05-30 13:57:12 +01:00
Gerhard Lazu	6eeeb6b0b4	Set up separate clusters for Overview & Distribution metrics This will allow to simulate multiple clusters feeding metrics into Prometheus & Grafana. [#165818813]	2019-05-30 13:56:17 +01:00
Gerhard Lazu	931e07455e	Extract metrics services into a separate docker-compose file Add cadvisor & node-exporter & Docker metrics. Inspired by https://github.com/stefanprodan/dockprom There are no Grafana dashboards for these metrics yet. The dockprom ones don't show any panels in Grafana 6. [#165818813]	2019-05-30 13:53:02 +01:00
Gerhard Lazu	7b632674c9	Update Distribution & Overview dashboard tags	2019-05-30 09:37:24 +01:00
Gerhard Lazu	6e614301d0	Bump Grafana & Prometheus images to latest stable	2019-05-30 09:36:52 +01:00
Gerhard Lazu	21aa9e9e68	Awlays install Grafana plugins Even though this slows down Grafana container startup, we need to ensure that this plugin is present, otherwise the panels that track process state won't work. This will be slow the first time the plugin is downloaded, and slightly faster on subsequent runs. [#166004512]	2019-05-30 09:35:12 +01:00
Gerhard Lazu	06bae443f6	Put more pressure on the distribution This forces the tls_connection and tls_sender processes to be in runnable & running states more often. waiting is the ideal state. [#166004512]	2019-05-30 09:30:21 +01:00
Gerhard Lazu	f06502ff5a	Improve RabbitMQ Overview Grafana dashboard * pin nodes to specific colours * add message-related single-stats * reshuffle rows * node metrics are most useful * queue, channel & connection churn are least useful	2019-05-29 18:21:25 +01:00
Gerhard Lazu	2645082738	Finish Erlang Distribution Grafana dashboard Includes Erlang node to colour pinning Adds a few make targets to help with docker-compose repetitive commands & Grafana dashboard updates. Split Overview & Distribution Docker deployments re deadtrickster/prometheus.erl#92 [finishes #166004512]	2019-05-29 18:19:09 +01:00
Gerhard Lazu	c4f0105bad	Fix invalid docker-compose An empty environment is not allowed	2019-05-20 22:21:06 +01:00
Gerhard Lazu	4e81af4cfc	Pin RabbitMQ nodes to colours in all Grafana panesl Regex is greedy, need to look into non-greedy matching, especially for Erlang Distribution metrics. [#166004512]	2019-05-20 22:19:43 +01:00
Gerhard Lazu	6a6877cc8b	Do not install Grafana plugins all the time Even if the plugins are stored on a volume, updating them can be slow.	2019-05-20 22:19:02 +01:00
Gerhard Lazu	204f170e7e	Bump Grafana, Prometheus & PerfTest versions to latest	2019-05-20 22:18:32 +01:00
Gerhard Lazu	d01f96a9af	Bump Erlang/OTP to 22.0.1 in Docker image Picking up the latest 3.8.0 alpha while at it. It's time to start stress-testing OTP 22 re docker-library/rabbitmq#336 [#166037004]	2019-05-20 21:43:01 +01:00
Gerhard Lazu	d1460d5b44	Stress Erlang Distribution metrics on OTP 21 We (+@essen) have answered a bunch of questions (see the story) and improved the metrics + dashboard in the process. Added some improvements to the RabbitMQ Overview metrics as well. [#166004104]	2019-05-20 21:41:27 +01:00
Gerhard Lazu	1f333ebed6	Display the number of Erlang Distribution links [#166004104]	2019-05-20 10:12:22 +01:00
Jean-Sébastien Pédron	31fd5d3bcf	Update rabbitmq-components.mk	2019-05-17 15:20:25 +02:00
Michael Klishin	fe6a8fe8f0	Update rabbitmq-components.mk	2019-05-17 08:37:05 +03:00
Michael Klishin	dd76a4e244	Update rabbitmq-components.mk	2019-05-17 00:45:49 +03:00
Gerhard Lazu	691af35dfb	Mirror slow-consumer-persistent queue to all 3 nodes This puts load on the distribution and makes the Erlang-Distribution dashboard show an interesting behaviour in TCP sockets. @dcorbacho thinks so too. re deadtrickster/prometheus.erl#92 [#166004512]	2019-05-15 18:49:58 +01:00
Gerhard Lazu	f9ce43677b	Review metrics with @dcorbacho [accepts #165831668]	2019-05-15 17:06:42 +01:00
Gerhard Lazu	ebde2ff663	Default Erlang Distribution Grafana dashboard to 10 minutes It's the same as RabbitMQ Overview	2019-05-15 17:05:07 +01:00
Gerhard Lazu	f5681155a8	Bump Erlang/OTP version to 21.3.8.1	2019-05-15 14:26:02 +01:00
Gerhard Lazu	ff1da6aa86	Fix premature prometheus dep version bump This version was used to QA deadtrickster/prometheus.erl#92 locally, didn't intend to push it upstream. Thanks for the spot @dumbbell!	2019-05-15 10:06:07 +01:00
Gerhard Lazu	7652799e05	Add Grafana dashboard for Erlang Distribution Just the first version, imperfect in many ways, but better than nothing. [#166004512]	2019-05-14 16:17:04 +01:00
Michael Klishin	37ec1d50c2	Update rabbitmq-components.mk	2019-05-14 11:50:59 +03:00
Michael Klishin	0bd741f760	Update rabbitmq-components.mk	2019-05-13 16:45:49 +03:00
Gerhard Lazu	6272f30724	Missed get empty Grafana panel [#165831668]	2019-05-09 17:38:34 +01:00
Gerhard Lazu	c596efb58e	Review all metrics to ETS mappings Clarify descriptions, improve metric names, fix typos etc. Follow-up to deadtrickster/prometheus_rabbitmq_exporter#75. Helpful metric descriptions * https://www.rabbitmq.com/monitoring.html * https://docs.signalfx.com/en/latest/integrations/integrations-reference/integrations.rabbitmq.html * https://github.com/rabbitmq/rabbitmq-common/blob/master/include/rabbit_core_metrics.hrl * https://github.com/rabbitmq/rabbitmq-common/blob/master/src/rabbit_core_metrics.erl Thanks for the pair-up @michaelklishin! [finishes #165831668]	2019-05-09 17:25:17 +01:00
Gerhard Lazu	b731233ee6	Fix top-level app reference in README config example	2019-05-09 13:22:00 +01:00
Gerhard Lazu	8186847754	Increase msg redelivered / s thresholds to 20 & 100	2019-05-07 16:29:57 +01:00
Gerhard Lazu	982b1c798d	Force connection, channel & queue churn	2019-05-07 16:25:49 +01:00
Gerhard Lazu	5b7a2edc62	Add redelivered thresholds & make unroutable panels red Warn at 2 redelivered msg/s & critical at 10 redelivered msg/s	2019-05-07 15:40:37 +01:00
Gerhard Lazu	ceffc587a1	Add consumer that nacks messages every 0.5s re rabbitmq/rabbitmq-perf-test#204	2019-05-07 15:40:31 +01:00
Gerhard Lazu	d78cb66435	Fix basic.get metrics on Grafana dashboard Use 1m instead of $__interval for rates that track metrics with slow rate of change. Using $__interval will miss changes. Stop rounding, it skews values. All `basic.get` metrics are bad. The 0 threshold and the red colour for all lines is hopefully enought to convey this. re rabbitmq/rabbitmq-perf-test#203 [finishes #165852775]	2019-05-07 13:03:41 +01:00
Gerhard Lazu	e506502bf9	Bump Erlang to latest stable, 21.3.8 Add make target to quickly find the latest OTP version & SHA256 checksum.	2019-05-07 12:55:52 +01:00
Gerhard Lazu	ce89fdbcd1	Use latest PerfTest, with consumer rate fix rabbitmq/rabbitmq-perf-test#202	2019-05-02 17:58:39 +01:00
Gerhard Lazu	9d6708ebab	Use full descriptions in titles Otherwise it's really hard to know what we are looking at when expanding panels. Also, pin to colours. Otherwise, rabbit@rabbitmq1 metrics in one panel will appear yellow, and green in another panel. This is a one-off which doesn't scale, should be automated in some way. Grafana doesn't support pinning colors to labels 🤔	2019-05-02 17:33:47 +01:00
Gerhard Lazu	d44c0edf35	Fix rate interval for / s metrics This explains why we want rate() instead of irate() and a 1m interval for smaller changes & $__interval for higher ones: https://utcc.utoronto.ca/~cks/space/blog/sysadmin/PrometheusRateVsIrate [finishes #164374397]	2019-05-02 14:43:09 +01:00
Gerhard Lazu	250136e4a7	Expose queue created metric, fix queue deleted metric [#164374397]	2019-05-02 12:17:14 +01:00
Gerhard Lazu	7206cd31f1	Expose Erlang distribution port	2019-05-01 18:18:16 +01:00
Gerhard Lazu	3ec84b3bbd	Add object churn panels queue_created is not currently exported via rabbitmq_prometheus. [#164374397]	2019-05-01 18:16:49 +01:00
Gerhard Lazu	a2e6687162	Bump prometheus.erl to v4.3.0 This includes the global_labels feature introduced in deadtrickster/prometheus.erl#91 To test, run `docker-compose up` in docker dir, then navigate to localhost:15692/metrics & localhost:3000/dashboards (admin:admin) to see the Grafana RabbitMQ Overview dashboard.	2019-05-01 12:58:35 +01:00
Luke Bakken	019c78cdbe	Update rabbitmq-components.mk	2019-04-30 16:21:02 -07:00
Jean-Sébastien Pédron	917a73fe46	Update rabbitmq-components.mk	2019-04-30 14:47:36 +02:00
Gerhard Lazu	c664fdc5e1	Make use of variable message size re rabbitmq/rabbitmq-perf-test#200	2019-04-29 17:51:42 +01:00
Gerhard Lazu	58a8e5a011	Add vhosts instead of exchanges in global counts Add nodes, alarms & partitions to global counts. These are too important to not show them. Need to discuss how to expose these via metrics. [#164374397]	2019-04-29 09:49:01 +01:00
Gerhard Lazu	a2500b4784	Add global counts to RabbitMQ Overview dashboard [#164374397]	2019-04-25 19:12:14 +01:00
Gerhard Lazu	2a645a4d8c	Make all rates vary, force messages to be read from disk Set memory high watermark to 256MiB to force trigger the memory alarm, as well as ensure messages get paged to disk (forces disk reads). Make all legends display as table so that values are easier to see when toggling them.	2019-04-25 19:11:45 +01:00
Gerhard Lazu	ba302951b9	Make use of PerfTest variable rate Thanks @acogoluegnes! rabbitmq/rabbitmq-perf-test#195	2019-04-25 16:10:14 +01:00
Gerhard Lazu	eb25d3ed87	Explain better why some messages were returned vs dropped rabbitmq/rabbitmq-server#1988	2019-04-25 14:49:37 +01:00
Gerhard Lazu	ca995223d9	Add panel for dropped messages / s Acceptance for rabbitmq/rabbitmq-server#1988	2019-04-25 14:18:32 +01:00
Gerhard Lazu	e67dfeeadf	Add rabbit beam files relevant to rabbitmq/rabbitmq-server#1988 This produces a bad rabbitmq-server build, perf-test crashes & so do rabbit_channels. Will build a full rabbitmq-server-generic-unix locally, this mix & matching is definitely trouble. publisher-confirms_1 \| Main thread caught exception: java.io.IOException publisher-confirms_1 \| 13:07:38.003 [main] ERROR com.rabbitmq.perf.PerfTest - Main thread caught exception publisher-confirms_1 \| java.io.IOException: null publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:129) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:125) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:147) publisher-confirms_1 \| at com.rabbitmq.client.impl.ChannelN.open(ChannelN.java:133) publisher-confirms_1 \| at com.rabbitmq.client.impl.ChannelManager.createChannel(ChannelManager.java:182) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQConnection.createChannel(AMQConnection.java:555) publisher-confirms_1 \| at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.createChannel(AutorecoveringConnection.java:165) publisher-confirms_1 \| at com.rabbitmq.perf.MulticastParams$TopologyHandlerSupport.configureQueues(MulticastParams.java:616) publisher-confirms_1 \| at com.rabbitmq.perf.MulticastParams$FixedQueuesTopologyHandler.configureQueuesForClient(MulticastParams.java:699) publisher-confirms_1 \| at com.rabbitmq.perf.MulticastParams.createConsumer(MulticastParams.java:405) publisher-confirms_1 \| at com.rabbitmq.perf.MulticastSet.createConsumers(MulticastSet.java:244) publisher-confirms_1 \| at com.rabbitmq.perf.MulticastSet.run(MulticastSet.java:126) publisher-confirms_1 \| at com.rabbitmq.perf.PerfTest.main(PerfTest.java:276) publisher-confirms_1 \| at com.rabbitmq.perf.PerfTest.main(PerfTest.java:374) publisher-confirms_1 \| Caused by: com.rabbitmq.client.ShutdownSignalException: connection error publisher-confirms_1 \| at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66) publisher-confirms_1 \| at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:502) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:293) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:141) publisher-confirms_1 \| ... 11 common frames omitted publisher-confirms_1 \| Caused by: java.net.SocketException: Connection reset publisher-confirms_1 \| at java.base/java.net.SocketInputStream.read(SocketInputStream.java:186) publisher-confirms_1 \| at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) publisher-confirms_1 \| at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252) publisher-confirms_1 \| at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271) publisher-confirms_1 \| at java.base/java.io.DataInputStream.readUnsignedByte(DataInputStream.java:293) publisher-confirms_1 \| at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:91) publisher-confirms_1 \| at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:164) publisher-confirms_1 \| at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:598) publisher-confirms_1 \| at java.base/java.lang.Thread.run(Thread.java:834) rabbitmq1_1 \| 2019-04-25 12:40:53.778 [info] <0.1215.0> accepting AMQP connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672) rabbitmq1_1 \| 2019-04-25 12:40:53.840 [info] <0.1215.0> Connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672) has a client-provided name: perf-test-test rabbitmq1_1 \| 2019-04-25 12:40:53.849 [info] <0.1215.0> connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672 - perf-test-test): user 'guest' authenticated and granted access to vhost '/' rabbitmq1_1 \| 2019-04-25 12:40:53.855 [info] <0.1215.0> closing AMQP connection <0.1215.0> (172.25.0.7:38752 -> 172.25.0.4:5672 - perf-test-test, vhost: '/', user: 'guest') rabbitmq1_1 \| 2019-04-25 12:40:53.860 [info] <0.1224.0> accepting AMQP connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672) rabbitmq1_1 \| 2019-04-25 12:40:53.862 [info] <0.1224.0> Connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672) has a client-provided name: perf-test-configuration rabbitmq1_1 \| 2019-04-25 12:40:53.864 [info] <0.1224.0> connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672 - perf-test-configuration): user 'guest' authenticated and granted access to vhost '/' rabbitmq1_1 \| 2019-04-25 12:40:53.877 [info] <0.1231.0> accepting AMQP connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672) rabbitmq1_1 \| 2019-04-25 12:40:53.880 [info] <0.1231.0> Connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672) has a client-provided name: perf-test-consumer-0 rabbitmq1_1 \| 2019-04-25 12:40:53.882 [info] <0.1231.0> connection <0.1231.0> (172.25.0.7:38756 -> 172.25.0.4:5672 - perf-test-consumer-0): user 'guest' authenticated and granted access to vhost '/' rabbitmq1_1 \| 2019-04-25 12:40:53.890 [error] <0.1239.0> CRASH REPORT Process <0.1239.0> with 0 neighbours exited with reason: no match of right hand value undefined in rabbit_channel:init_queue_cleanup_timer/1 line 2604 in gen_server2:init_it/6 line 597 rabbitmq1_1 \| 2019-04-25 12:40:53.891 [error] <0.1231.0> CRASH REPORT Process <0.1231.0> with 0 neighbours crashed with reason: no match of right hand value {error,{'EXIT',{{badmatch,{error,{{{badmatch,undefined},[{rabbit_channel,init_queue_cleanup_timer,1,[{file,"src/rabbit_channel.erl"},{line,2604}]},{rabbit_channel,init,1,[{file,"src/rabbit_channel.erl"},{line,528}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},{child,undefined,channel,{rabbit_channel,start_link,[1,<0.1231.0>,<0.1237.0>,<0.1231.0>,<<"172.25.0.7:38756 -> 172.25.0.4:5672">>,rabbit_framing_amqp_0_9_1,...]},...}}}},...}}} in rabbit_reader:create_channel/2 line 923 rabbitmq1_1 \| 2019-04-25 12:40:53.891 [error] <0.1229.0> Supervisor {<0.1229.0>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.1230.0>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.1231.0> exit with reason no match of right hand value {error,{'EXIT',{{badmatch,{error,{{{badmatch,undefined},[{rabbit_channel,init_queue_cleanup_timer,1,[{file,"src/rabbit_channel.erl"},{line,2604}]},{rabbit_channel,init,1,[{file,"src/rabbit_channel.erl"},{line,528}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},{child,undefined,channel,{rabbit_channel,start_link,[1,<0.1231.0>,<0.1237.0>,<0.1231.0>,<<"172.25.0.7:38756 -> 172.25.0.4:5672">>,rabbit_framing_amqp_0_9_1,...]},...}}}},...}}} in rabbit_reader:create_channel/2 line 923 in context child_terminated rabbitmq1_1 \| 2019-04-25 12:40:53.891 [error] <0.1229.0> Supervisor {<0.1229.0>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.1230.0>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.1231.0> exit with reason reached_max_restart_intensity in context shutdown rabbitmq1_1 \| 2019-04-25 12:40:54.376 [warning] <0.1224.0> closing AMQP connection <0.1224.0> (172.25.0.7:38754 -> 172.25.0.4:5672 - perf-test-configuration, vhost: '/', user: 'guest'): rabbitmq1_1 \| client unexpectedly closed TCP connection	2019-04-25 13:58:53 +01:00
Gerhard Lazu	1524f550d4	Expose channel_exchange_drop_unroutable gauge Part of rabbitmq/rabbitmq-server#1904	2019-04-25 13:14:55 +01:00
Gerhard Lazu	fcf278a7a2	Wording improvements Thanks @MarcialRosales! [#164374751]	2019-04-25 11:48:03 +01:00
Gerhard Lazu	4e25ce69c9	Build Docker image based on latest alpha release Bump Erlang to v21.3.6	2019-04-25 11:47:30 +01:00
Gerhard Lazu	cded1500fb	Add persistent messages panels, improve panel grouping [finishes #164374751]	2019-04-24 21:38:13 +01:00
Gerhard Lazu	9c3bbe8c76	Display more message metrics, simulate different types of publishers/consumers [#164374751]	2019-04-24 18:38:13 +01:00
Gerhard Lazu	6a0e7f183a	Fix eunit_formatters test dependency resolution Seconds TEST_DEPS definition would clobber the first one, that referenced eunit_formatters	2019-04-23 21:03:12 +01:00
Gerhard Lazu	efe803617b	Set demo-friendly memory & disk-space limits Capture limits in thresholds. Even if they are static and somewhat specific to this RabbitMQ deployment, it's better to have them when demo-ing the end-to-end Prometheus/Grafana experience. [#164374751]	2019-04-23 18:56:30 +01:00
Gerhard Lazu	ced5ee74ed	Bump to latest rabbitmq-prometheus Docker image	2019-04-23 18:56:04 +01:00
Gerhard Lazu	62df8c3b62	Use today's date for the Docker image tag	2019-04-23 18:55:40 +01:00
Gerhard Lazu	9f8852b0c1	Make RabbitMQ container fd limit better-suited for demos 1k feels right for a demo environment, where we want to simulate hitting threshold limits.	2019-04-23 14:06:57 +01:00
Gerhard Lazu	ffa977f8f3	Add ctop make target Top-like interface for container metrics https://ctop.sh	2019-04-23 13:48:46 +01:00
Gerhard Lazu	ec03866172	Publish messages as mandatory, with publisher confirms This lights up `Published confirmed / s` Grafana panel. To light up `Published unroutable / s`, unbind all queues from the direct exchange. [#164374751]	2019-04-23 13:48:40 +01:00
Gerhard Lazu	f3cae13394	Bump Docker image version used in docker-compose	2019-04-23 13:44:14 +01:00
Gerhard Lazu	0441fae092	Lower docker-compose file version Not using anything that requires Docker 18.06, dropping to 18.02. Thanks @MarcialRosales!	2019-04-23 13:42:05 +01:00
Gerhard Lazu	8ec633926b	Use nc to check that all nodes are listening on AMQP Related to rabbitmq/rabbitmq-perf-test#191	2019-04-23 13:41:15 +01:00
Gerhard Lazu	dbb1bee383	Build Docker image with the latest management & management_agent This has support for disabling metrics_collector, as captured in rabbitmq/rabbitmq-management-agent#78 & rabbitmq/rabbitmq-management#691 Since we want management to be enabled, this doesn't help our use-case, but this option is perfect for users that want metrics, but don't want to pay the overhead of Management - especially metric aggregations. [#164376052]	2019-04-22 17:47:22 +01:00
Gerhard Lazu	789f6c041e	Preconfigure Grafana on boot After running `docker-compose up`, open Grafana via http://localhost:3000 and login with user admin & password admin. After logging in, you will see a RabbitMQ Overview dashboard pre-loaded (/・0・) Thanks @cirocosta! https://github.com/cirocosta/sample-grafana cc @MarcialRosales [finishes #164374321]	2019-04-16 15:48:44 +01:00
Gerhard Lazu	5329f9f836	Increse PerfTest publish rate to 10 msg/s per publisher Wanted to put a bit more load on the RabbitMQ nodes	2019-04-16 15:48:44 +01:00
Gerhard Lazu	0a9a085e91	Make PerfTest wait longer for RabbitMQ nodes to start If all 3 RabbitMQ nodes take 15s to start, PerfTest container will crash and not be auto-restarted.	2019-04-16 15:48:43 +01:00
Gerhard Lazu	e8e224cc57	Expose management & prometheus ports for all RabbitMQ nodes * rabbitmq1 - http://localhost:15672 & http://localhost:15692/metrics * rabbitmq2 - http://localhost:25672 & http://localhost:25692/metrics * rabbitmq3 - http://localhost:35672 & http://localhost:35692/metrics [#164374321]	2019-04-16 15:48:43 +01:00
Gerhard Lazu	307585568a	Add amqp_client build dep Otherwise it fails to compile in a clean umbrella	2019-04-16 12:48:05 +01:00
Gerhard Lazu	e6af0c39fe	Remove table legend from metrics Makes the graphs look too busy [#164374321]	2019-04-15 19:21:22 +01:00
Gerhard Lazu	d61c58d0d6	Capture a 3-node RabbitMQ & Prometheus + Grafana stack cd docker && docker-compose up [#164374321]	2019-04-15 19:18:45 +01:00
Gerhard Lazu	394426559e	Add RabbitMQ Overview Grafana dashboard Captures all nodes metrics shown on the Overview page: * File descriptors * Socket descriptors * Erlang processes * Memory * Disk Not displaying any limits since they would make the variations impossible to see. For example, when file descriptors go for 90 to 30, if one of the metrics on the graph is 1048576 (Docker image default for rabbitmq_node_sockets_total), it's impossible to see the metric change from 90 to 30. The same problem is present in the current RabbitMQ Management graphs on the node page, under Node statistics. No thresholds have been set. Threshold values must be defined as integers in Grafana 6, we can't reference metrics e.g. rabbitmq_node_sockets_total. Templating the dashboard would be one way, but the problem with that is keeping it in sync with limits. It's a more difficult problem than meets the eye, deferring it for now. Created on Grafana v6.1 [finishes #164374321]	2019-04-15 13:30:13 +01:00
Gerhard Lazu	891c304981	Add make targets to build & publish Docker image Dockerfile is based on https://github.com/docker-library/rabbitmq/blob/master/3.8-rc/ubuntu/Dockerfile + https://github.com/docker-library/rabbitmq/blob/master/3.8-rc/ubuntu/management/Dockerfile	2019-04-15 11:32:54 +01:00
Gerhard Lazu	802417c921	Avoid an extra function call, prepend_node_label Thanks @dcorbacho!	2019-04-15 10:46:46 +01:00
Gerhard Lazu	c513beb926	Prepend node label to all metrics, including totals	2019-04-10 12:57:16 +01:00
Gerhard Lazu	c5d5c04b3b	Convert certain metrics from gauges to counters They are cumulative and can only increase or be reset to zero on restart: https://prometheus.io/docs/concepts/metric_types/#counter	2019-04-10 11:59:18 +01:00
Gerhard Lazu	d05f64b070	Remove prometheus_cowboy & prometheus_httpd deps They are not needed	2019-04-10 11:51:27 +01:00
Gerhard Lazu	5f9e7ff212	Do not import functions that are not being used	2019-04-10 11:49:31 +01:00
Gerhard Lazu	113cc7294b	Use register_collector/1 and the default registry	2019-04-10 11:21:47 +01:00
Gerhard Lazu	e61a5efde9	Use latest deps, depend on rabbitmq_management_agent + add node label Bumping all prometheus-related deps to latest stable. Defining them in rabbitmq-components.mk, so that they can be promoted to all deps in umbrella. rabbitmq_management_agent is required for alarm-related metrics to be available. Added node label to most `rabbitmq_` metrics. I need help adding them to mfa_totals - metrics_node_label_test test currently fails. The new unit tests ensure that label/0 behaves as expected in all cases - made refactoring easy. Run unit tests via: gmake eunit EUNIT_MODS=prometheus_rabbitmq_core_metrics_collector Updating to latest erlang.mk makes running eunit tests much faster: 2s vs 10s. To do this, comment `ERLANG_MK_*` in Makefile and run `gmake erlank-mk`.	2019-04-09 17:02:40 +01:00
Michael Klishin	08be918234	Change default port to 15692	2019-04-01 18:00:39 +03:00
Michael Klishin	492908ee1d	README updates, point out project immaturity	2019-04-01 17:57:40 +03:00
Michael Klishin	8cfeaea344	Sync Code of Conduct doc with other RabbitMQ subprojects	2019-04-01 17:57:18 +03:00
Michael Klishin	7fc23b8ce0	License files	2019-04-01 17:57:00 +03:00
Diana Corbacho	46431063a3	Support only text format and (optional) gzip encoding Since Prometheus 2.0 protobuf is not longer a supported format	2019-03-13 14:36:49 +00:00
Diana Corbacho	01c81e07e9	Metrics rename to follow Prometheus naming convetions	2019-03-13 14:36:49 +00:00
Diana Corbacho	1180d13285	Remove dependency from prometheus_httpd and prometheus_cowboy	2019-03-13 14:36:49 +00:00
Diana Corbacho	0490a43f17	Core metrics prometheus collector	2019-03-13 14:36:44 +00:00

... 4 5 6 7 8 ...

521 Commits