elasticsearch/docs/reference/aggregations/pipeline/percentiles-bucket-aggregat...

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

135 lines
4.2 KiB
Plaintext
Raw Normal View History

[[search-aggregations-pipeline-percentiles-bucket-aggregation]]
=== Percentiles bucket aggregation
++++
<titleabbrev>Percentiles bucket</titleabbrev>
++++
A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
==== Syntax
A `percentiles_bucket` aggregation looks like this in isolation:
[source,js]
--------------------------------------------------
{
"percentiles_bucket": {
"buckets_path": "the_sum"
}
}
--------------------------------------------------
// NOTCONSOLE
[[percentiles-bucket-params]]
.`percentiles_bucket` Parameters
[options="header"]
|===
|Parameter Name |Description |Required |Default Value
|`buckets_path` |The path to the buckets we wish to find the percentiles for (see <<buckets-path-syntax>> for more
details) |Required |
|`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
details)|Optional | `skip`
|`format` |format to apply to the output value of this aggregation |Optional | `null`
|`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
|`keyed` |Flag which returns the range as an hash instead of an array of key-value pairs |Optional | `true`
|===
The following snippet calculates the percentiles for the total monthly `sales` buckets:
[source,console]
--------------------------------------------------
POST /sales/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_month>sales", <1>
"percents": [ 25.0, 50.0, 75.0 ] <2>
}
}
}
}
--------------------------------------------------
// TEST[setup:sales]
<1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
the `sales` aggregation in the `sales_per_month` date histogram.
2016-11-07 21:55:07 +08:00
<2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentiles.
And the following may be the response:
[source,console-result]
--------------------------------------------------
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"percentiles_monthly_sales": {
"values" : {
2016-11-07 21:55:07 +08:00
"25.0": 375.0,
"50.0": 375.0,
"75.0": 550.0
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 11/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
==== Percentiles_bucket implementation
The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
interpolate between data points.
The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
2021-03-31 21:57:47 +08:00
data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
data-points in a single `percentiles_bucket`.