291 lines
8.7 KiB
Markdown
291 lines
8.7 KiB
Markdown
---
|
||
navigation_title: "Bucket correlation"
|
||
mapped_pages:
|
||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-correlation-aggregation.html
|
||
---
|
||
|
||
# Bucket correlation aggregation [search-aggregations-bucket-correlation-aggregation]
|
||
|
||
|
||
A sibling pipeline aggregation which executes a correlation function on the configured sibling multi-bucket aggregation.
|
||
|
||
## Parameters [bucket-correlation-agg-syntax]
|
||
|
||
### `buckets_path`
|
||
|
||
(Required, string) Path to the buckets that contain one set of values to correlate. For syntax, see [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax).
|
||
|
||
### `function`
|
||
|
||
(Required, object) The correlation function to execute.
|
||
|
||
#### `count_correlation`
|
||
|
||
(Required*, object) The configuration to calculate a count correlation. This function is designed for determining the correlation of a term value and a given metric. Consequently, it needs to meet the following requirements.
|
||
|
||
* The `buckets_path` must point to a `_count` metric.
|
||
* The total count of all the `bucket_path` count values must be less than or equal to `indicator.doc_count`.
|
||
* When utilizing this function, an initial calculation to gather the required `indicator` values is required.
|
||
|
||
`indicator`
|
||
: (Required, object) The indicator with which to correlate the configured `bucket_path` values.
|
||
|
||
`doc_count`
|
||
: (Required, integer) The total number of documents that initially created the `expectations`. It’s required to be greater than or equal to the sum of all values in the `buckets_path` as this is the originating superset of data to which the term values are correlated.
|
||
|
||
`expectations`
|
||
: (Required, array) An array of numbers with which to correlate the configured `bucket_path` values. The length of this value must always equal the number of buckets returned by the `bucket_path`.
|
||
|
||
`fractions`
|
||
: (Optional, array) An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the `buckets_path` have known gaps. The length of `fractions`, if provided, must equal `expectations`.
|
||
|
||
## Syntax [_syntax_8]
|
||
|
||
A `bucket_correlation` aggregation looks like this in isolation:
|
||
|
||
```js
|
||
{
|
||
"bucket_correlation": {
|
||
"buckets_path": "range_values>_count", <1>
|
||
"function": {
|
||
"count_correlation": { <2>
|
||
"indicator": {
|
||
"expectations": [...],
|
||
"doc_count": 10000
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
1. The buckets containing the values to correlate against.
|
||
2. The correlation function definition.
|
||
|
||
|
||
|
||
## Example [bucket-correlation-agg-example]
|
||
|
||
The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown is the pre-calculation of the `latency` indicator values, which was done utilizing the [percentiles](/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md) aggregation.
|
||
|
||
This example is only using the 10s percentiles.
|
||
|
||
```console
|
||
POST correlate_latency/_search?size=0&filter_path=aggregations
|
||
{
|
||
"aggs": {
|
||
"buckets": {
|
||
"terms": { <1>
|
||
"field": "version",
|
||
"size": 2
|
||
},
|
||
"aggs": {
|
||
"latency_ranges": {
|
||
"range": { <2>
|
||
"field": "latency",
|
||
"ranges": [
|
||
{ "to": 0.0 },
|
||
{ "from": 0, "to": 105 },
|
||
{ "from": 105, "to": 225 },
|
||
{ "from": 225, "to": 445 },
|
||
{ "from": 445, "to": 665 },
|
||
{ "from": 665, "to": 885 },
|
||
{ "from": 885, "to": 1115 },
|
||
{ "from": 1115, "to": 1335 },
|
||
{ "from": 1335, "to": 1555 },
|
||
{ "from": 1555, "to": 1775 },
|
||
{ "from": 1775 }
|
||
]
|
||
}
|
||
},
|
||
"bucket_correlation": { <3>
|
||
"bucket_correlation": {
|
||
"buckets_path": "latency_ranges>_count",
|
||
"function": {
|
||
"count_correlation": {
|
||
"indicator": {
|
||
"expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
|
||
"doc_count": 200
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
1. The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate the correlation of the term values with the latency.
|
||
2. The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.
|
||
3. The bucket correlation aggregation that calculates the correlation of the number of term values within each range and the previously calculated indicator values.
|
||
|
||
|
||
And the following may be the response:
|
||
|
||
```console-result
|
||
{
|
||
"aggregations" : {
|
||
"buckets" : {
|
||
"doc_count_error_upper_bound" : 0,
|
||
"sum_other_doc_count" : 0,
|
||
"buckets" : [
|
||
{
|
||
"key" : "1.0",
|
||
"doc_count" : 100,
|
||
"latency_ranges" : {
|
||
"buckets" : [
|
||
{
|
||
"key" : "*-0.0",
|
||
"to" : 0.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "0.0-105.0",
|
||
"from" : 0.0,
|
||
"to" : 105.0,
|
||
"doc_count" : 1
|
||
},
|
||
{
|
||
"key" : "105.0-225.0",
|
||
"from" : 105.0,
|
||
"to" : 225.0,
|
||
"doc_count" : 9
|
||
},
|
||
{
|
||
"key" : "225.0-445.0",
|
||
"from" : 225.0,
|
||
"to" : 445.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "445.0-665.0",
|
||
"from" : 445.0,
|
||
"to" : 665.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "665.0-885.0",
|
||
"from" : 665.0,
|
||
"to" : 885.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "885.0-1115.0",
|
||
"from" : 885.0,
|
||
"to" : 1115.0,
|
||
"doc_count" : 10
|
||
},
|
||
{
|
||
"key" : "1115.0-1335.0",
|
||
"from" : 1115.0,
|
||
"to" : 1335.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "1335.0-1555.0",
|
||
"from" : 1335.0,
|
||
"to" : 1555.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "1555.0-1775.0",
|
||
"from" : 1555.0,
|
||
"to" : 1775.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "1775.0-*",
|
||
"from" : 1775.0,
|
||
"doc_count" : 20
|
||
}
|
||
]
|
||
},
|
||
"bucket_correlation" : {
|
||
"value" : 0.8402398981360937
|
||
}
|
||
},
|
||
{
|
||
"key" : "2.0",
|
||
"doc_count" : 100,
|
||
"latency_ranges" : {
|
||
"buckets" : [
|
||
{
|
||
"key" : "*-0.0",
|
||
"to" : 0.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "0.0-105.0",
|
||
"from" : 0.0,
|
||
"to" : 105.0,
|
||
"doc_count" : 19
|
||
},
|
||
{
|
||
"key" : "105.0-225.0",
|
||
"from" : 105.0,
|
||
"to" : 225.0,
|
||
"doc_count" : 11
|
||
},
|
||
{
|
||
"key" : "225.0-445.0",
|
||
"from" : 225.0,
|
||
"to" : 445.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "445.0-665.0",
|
||
"from" : 445.0,
|
||
"to" : 665.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "665.0-885.0",
|
||
"from" : 665.0,
|
||
"to" : 885.0,
|
||
"doc_count" : 20
|
||
},
|
||
{
|
||
"key" : "885.0-1115.0",
|
||
"from" : 885.0,
|
||
"to" : 1115.0,
|
||
"doc_count" : 10
|
||
},
|
||
{
|
||
"key" : "1115.0-1335.0",
|
||
"from" : 1115.0,
|
||
"to" : 1335.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "1335.0-1555.0",
|
||
"from" : 1335.0,
|
||
"to" : 1555.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "1555.0-1775.0",
|
||
"from" : 1555.0,
|
||
"to" : 1775.0,
|
||
"doc_count" : 0
|
||
},
|
||
{
|
||
"key" : "1775.0-*",
|
||
"from" : 1775.0,
|
||
"doc_count" : 0
|
||
}
|
||
]
|
||
},
|
||
"bucket_correlation" : {
|
||
"value" : -0.5759855613334943
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
|