| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | --- | 
					
						
							|  |  |  | title: Getting started | 
					
						
							| 
									
										
										
										
											2017-10-26 21:53:27 +08:00
										 |  |  | sort_rank: 1 | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This guide is a "Hello World"-style tutorial which shows how to install, | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | configure, and use a simple Prometheus instance. You will download and run | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | Prometheus locally, configure it to scrape itself and an example application, | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | then work with queries, rules, and graphs to use collected time | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | series data. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Downloading and running Prometheus
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [Download the latest release](https://prometheus.io/download) of Prometheus for | 
					
						
							|  |  |  | your platform, then extract and run it: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | tar xvfz prometheus-*.tar.gz | 
					
						
							|  |  |  | cd prometheus-* | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Before starting Prometheus, let's configure it. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Configuring Prometheus to monitor itself
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | Prometheus collects metrics from _targets_ by scraping metrics HTTP | 
					
						
							|  |  |  | endpoints. Since Prometheus exposes data in the same | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | manner about itself, it can also scrape and monitor its own health. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | While a Prometheus server that collects only data about itself is not very | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | useful, it is a good starting example. Save the following basic | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | Prometheus configuration as a file named `prometheus.yml`: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```yaml | 
					
						
							|  |  |  | global: | 
					
						
							|  |  |  |   scrape_interval:     15s # By default, scrape targets every 15 seconds. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   # Attach these labels to any time series or alerts when communicating with | 
					
						
							|  |  |  |   # external systems (federation, remote storage, Alertmanager). | 
					
						
							|  |  |  |   external_labels: | 
					
						
							|  |  |  |     monitor: 'codelab-monitor' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # A scrape configuration containing exactly one endpoint to scrape:
 | 
					
						
							|  |  |  | # Here it's Prometheus itself.
 | 
					
						
							|  |  |  | scrape_configs: | 
					
						
							|  |  |  |   # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. | 
					
						
							|  |  |  |   - job_name: 'prometheus' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     # Override the global default and scrape targets from this job every 5 seconds. | 
					
						
							|  |  |  |     scrape_interval: 5s | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     static_configs: | 
					
						
							|  |  |  |       - targets: ['localhost:9090'] | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For a complete specification of configuration options, see the | 
					
						
							| 
									
										
										
										
											2017-10-27 15:47:38 +08:00
										 |  |  | [configuration documentation](configuration/configuration.md). | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Starting Prometheus
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-10-26 21:42:07 +08:00
										 |  |  | To start Prometheus with your newly created configuration file, change to the | 
					
						
							|  |  |  | directory containing the Prometheus binary and run: | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | # Start Prometheus.
 | 
					
						
							| 
									
										
										
										
											2017-10-28 18:08:33 +08:00
										 |  |  | # By default, Prometheus stores its database in ./data (flag --storage.tsdb.path).
 | 
					
						
							|  |  |  | ./prometheus --config.file=prometheus.yml | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-10-26 21:42:07 +08:00
										 |  |  | Prometheus should start up. You should also be able to browse to a status page | 
					
						
							|  |  |  | about itself at [localhost:9090](http://localhost:9090). Give it a couple of | 
					
						
							|  |  |  | seconds to collect data about itself from its own HTTP metrics endpoint. | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | You can also verify that Prometheus is serving metrics about itself by | 
					
						
							|  |  |  | navigating to its metrics endpoint: | 
					
						
							|  |  |  | [localhost:9090/metrics](http://localhost:9090/metrics) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Using the expression browser
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | Let us explore data that Prometheus has collected about itself. To | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | use Prometheus's built-in expression browser, navigate to | 
					
						
							| 
									
										
										
										
											2022-01-15 05:14:55 +08:00
										 |  |  | http://localhost:9090/graph and choose the "Table" view within the "Graph" tab. | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-10-26 21:42:07 +08:00
										 |  |  | As you can gather from [localhost:9090/metrics](http://localhost:9090/metrics), | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | one metric that Prometheus exports about itself is named | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | `prometheus_target_interval_length_seconds` (the actual amount of time between | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | target scrapes). Enter the below into the expression console and then click "Execute": | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | prometheus_target_interval_length_seconds | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-10-26 21:42:07 +08:00
										 |  |  | This should return a number of different time series (along with the latest value | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | recorded for each), each with the metric name | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | `prometheus_target_interval_length_seconds`, but with different labels. These | 
					
						
							|  |  |  | labels designate different latency percentiles and target group intervals. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | If we are interested only in 99th percentile latencies, we could use this | 
					
						
							|  |  |  | query: | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | prometheus_target_interval_length_seconds{quantile="0.99"} | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To count the number of returned time series, you could write: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | count(prometheus_target_interval_length_seconds) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For more about the expression language, see the | 
					
						
							|  |  |  | [expression language documentation](querying/basics.md). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Using the graphing interface
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" | 
					
						
							|  |  |  | tab. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | For example, enter the following expression to graph the per-second rate of chunks | 
					
						
							| 
									
										
										
										
											2017-11-01 23:35:50 +08:00
										 |  |  | being created in the self-scraped Prometheus: | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							| 
									
										
										
										
											2017-11-01 23:35:50 +08:00
										 |  |  | rate(prometheus_tsdb_head_chunks_created_total[1m]) | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Experiment with the graph range parameters and other settings. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Starting up some sample targets
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | Let's add additional targets for Prometheus to scrape. | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | The Node Exporter is used as an example target, for more information on using it | 
					
						
							|  |  |  | [see these instructions.](https://prometheus.io/docs/guides/node-exporter/) | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | tar -xzvf node_exporter-*.*.tar.gz | 
					
						
							|  |  |  | cd node_exporter-*.* | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | # Start 3 example targets in separate terminals:
 | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | ./node_exporter --web.listen-address 127.0.0.1:8080 | 
					
						
							|  |  |  | ./node_exporter --web.listen-address 127.0.0.1:8081 | 
					
						
							|  |  |  | ./node_exporter --web.listen-address 127.0.0.1:8082 | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You should now have example targets listening on http://localhost:8080/metrics, | 
					
						
							|  |  |  | http://localhost:8081/metrics, and http://localhost:8082/metrics. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | ## Configure Prometheus to monitor the sample targets
 | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | Now we will configure Prometheus to scrape these new targets. Let's group all | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | three endpoints into one job called `node`. We will imagine that the | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | first two endpoints are production targets, while the third one represents a | 
					
						
							|  |  |  | canary instance. To model this in Prometheus, we can add several groups of | 
					
						
							|  |  |  | endpoints to a single job, adding extra labels to each group of targets. In | 
					
						
							|  |  |  | this example, we will add the `group="production"` label to the first group of | 
					
						
							|  |  |  | targets, while adding `group="canary"` to the second. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To achieve this, add the following job definition to the `scrape_configs` | 
					
						
							|  |  |  | section in your `prometheus.yml` and restart your Prometheus instance: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```yaml | 
					
						
							|  |  |  | scrape_configs: | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  |   - job_name:       'node' | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  |     # Override the global default and scrape targets from this job every 5 seconds. | 
					
						
							|  |  |  |     scrape_interval: 5s | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     static_configs: | 
					
						
							|  |  |  |       - targets: ['localhost:8080', 'localhost:8081'] | 
					
						
							|  |  |  |         labels: | 
					
						
							|  |  |  |           group: 'production' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       - targets: ['localhost:8082'] | 
					
						
							|  |  |  |         labels: | 
					
						
							|  |  |  |           group: 'canary' | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Go to the expression browser and verify that Prometheus now has information | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | about time series that these example endpoints expose, such as `node_cpu_seconds_total`. | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Configure rules for aggregating scraped data into new time series
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Though not a problem in our example, queries that aggregate over thousands of | 
					
						
							|  |  |  | time series can get slow when computed ad-hoc. To make this more efficient, | 
					
						
							| 
									
										
										
										
											2020-10-27 17:50:37 +08:00
										 |  |  | Prometheus can prerecord expressions into new persisted | 
					
						
							|  |  |  | time series via configured _recording rules_. Let's say we are interested in | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | recording the per-second rate of cpu time (`node_cpu_seconds_total`) averaged | 
					
						
							|  |  |  | over all cpus per instance (but preserving the `job`, `instance` and `mode` | 
					
						
							|  |  |  | dimensions) as measured over a window of 5 minutes. We could write this as: | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ``` | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m])) | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Try graphing this expression. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To record the time series resulting from this expression into a new metric | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file | 
					
						
							| 
									
										
										
										
											2017-10-31 21:29:41 +08:00
										 |  |  | with the following recording rule and save it as `prometheus.rules.yml`: | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2025-05-13 21:37:57 +08:00
										 |  |  | ```yaml | 
					
						
							| 
									
										
										
										
											2017-10-31 21:29:41 +08:00
										 |  |  | groups: | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | - name: cpu-node | 
					
						
							| 
									
										
										
										
											2017-10-31 21:29:41 +08:00
										 |  |  |   rules: | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  |   - record: job_instance_mode:node_cpu_seconds:avg_rate5m | 
					
						
							|  |  |  |     expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m])) | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-12-25 21:28:56 +08:00
										 |  |  | To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | look like this: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```yaml | 
					
						
							|  |  |  | global: | 
					
						
							|  |  |  |   scrape_interval:     15s # By default, scrape targets every 15 seconds. | 
					
						
							|  |  |  |   evaluation_interval: 15s # Evaluate rules every 15 seconds. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   # Attach these extra labels to all timeseries collected by this Prometheus instance. | 
					
						
							|  |  |  |   external_labels: | 
					
						
							|  |  |  |     monitor: 'codelab-monitor' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | rule_files: | 
					
						
							| 
									
										
										
										
											2017-10-31 21:29:41 +08:00
										 |  |  |   - 'prometheus.rules.yml' | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | scrape_configs: | 
					
						
							|  |  |  |   - job_name: 'prometheus' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     # Override the global default and scrape targets from this job every 5 seconds. | 
					
						
							|  |  |  |     scrape_interval: 5s | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     static_configs: | 
					
						
							|  |  |  |       - targets: ['localhost:9090'] | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  |   - job_name:       'node' | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  |     # Override the global default and scrape targets from this job every 5 seconds. | 
					
						
							|  |  |  |     scrape_interval: 5s | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     static_configs: | 
					
						
							|  |  |  |       - targets: ['localhost:8080', 'localhost:8081'] | 
					
						
							|  |  |  |         labels: | 
					
						
							|  |  |  |           group: 'production' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       - targets: ['localhost:8082'] | 
					
						
							|  |  |  |         labels: | 
					
						
							|  |  |  |           group: 'canary' | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Restart Prometheus with the new configuration and verify that a new time series | 
					
						
							| 
									
										
										
										
											2020-05-04 18:49:45 +08:00
										 |  |  | with the metric name `job_instance_mode:node_cpu_seconds:avg_rate5m` | 
					
						
							| 
									
										
										
										
											2017-10-10 20:58:52 +08:00
										 |  |  | is now available by querying it through the expression browser or graphing it. | 
					
						
							| 
									
										
										
										
											2022-05-17 17:49:54 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Reloading configuration
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As mentioned in the [configuration documentation](configuration/configuration.md) a | 
					
						
							|  |  |  | Prometheus instance can have its configuration reloaded without restarting the | 
					
						
							|  |  |  | process by using the `SIGHUP` signal. If you're running on Linux this can be | 
					
						
							|  |  |  | performed by using `kill -s SIGHUP <PID>`, replacing `<PID>` with your Prometheus | 
					
						
							|  |  |  | process ID. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Shutting down your instance gracefully.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | While Prometheus does have recovery mechanisms in the case that there is an | 
					
						
							| 
									
										
										
										
											2024-12-02 19:10:04 +08:00
										 |  |  | abrupt process failure it is recommended to use signals or interrupts for a | 
					
						
							|  |  |  | clean shutdown of a Prometheus instance. On Linux, this can be done by sending | 
					
						
							|  |  |  | the `SIGTERM` or `SIGINT` signals to the Prometheus process. For example, you | 
					
						
							|  |  |  | can use `kill -s <SIGNAL> <PID>`, replacing `<SIGNAL>` with the signal name | 
					
						
							|  |  |  | and `<PID>` with the Prometheus process ID. Alternatively, you can press the | 
					
						
							|  |  |  | interrupt character at the controlling terminal, which by default is `^C` (Control-C). |