2013-11-24 19:13:08 +08:00
[[search-aggregations-metrics-extendedstats-aggregation]]
2020-10-31 01:25:21 +08:00
=== Extended stats aggregation
++++
<titleabbrev>Extended stats</titleabbrev>
++++
2013-11-24 19:13:08 +08:00
A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
2015-01-23 03:01:15 +08:00
The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
2013-11-24 19:13:08 +08:00
Assuming the data consists of documents representing exams grades (between 0 and 100) of students
2019-09-05 00:51:02 +08:00
[source,console]
2013-11-24 19:13:08 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2013-11-24 19:13:08 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": { "extended_stats": { "field": "grade" } }
}
2013-11-24 19:13:08 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams]
2013-11-24 19:13:08 +08:00
The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
2020-06-11 03:00:50 +08:00
The `std_deviation` and `variance` are calculated as population metrics so they are always the same as `std_deviation_population` and `variance_population` respectively.
2013-11-24 19:13:08 +08:00
2019-09-07 02:05:36 +08:00
[source,console-result]
2013-11-24 19:13:08 +08:00
--------------------------------------------------
{
2020-07-21 03:08:04 +08:00
...
"aggregations": {
"grades_stats": {
"count": 2,
"min": 50.0,
"max": 100.0,
"avg": 75.0,
"sum": 150.0,
"sum_of_squares": 12500.0,
"variance": 625.0,
"variance_population": 625.0,
"variance_sampling": 1250.0,
"std_deviation": 25.0,
"std_deviation_population": 25.0,
"std_deviation_sampling": 35.35533905932738,
"std_deviation_bounds": {
"upper": 125.0,
"lower": 25.0,
"upper_population": 125.0,
"lower_population": 25.0,
"upper_sampling": 145.71067811865476,
"lower_sampling": 4.289321881345245
}
2013-11-24 19:13:08 +08:00
}
2020-07-21 03:08:04 +08:00
}
2013-11-24 19:13:08 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
2013-11-24 19:13:08 +08:00
2015-01-23 03:01:15 +08:00
The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
==== Standard Deviation Bounds
By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
2021-03-31 21:57:47 +08:00
deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
2015-01-23 03:01:15 +08:00
three standard deviations, you can set `sigma` in the request:
2019-09-05 00:51:02 +08:00
[source,console]
2015-01-23 03:01:15 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2015-01-23 03:01:15 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": {
"extended_stats": {
"field": "grade",
"sigma": 3 <1>
}
2015-01-23 03:01:15 +08:00
}
2020-07-21 03:08:04 +08:00
}
2015-01-23 03:01:15 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams]
2015-04-09 20:50:11 +08:00
<1> `sigma` controls how many standard deviations +/- from the mean should be displayed
2015-01-23 03:01:15 +08:00
2021-03-31 21:57:47 +08:00
`sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
2015-01-23 03:01:15 +08:00
return the average for both `upper` and `lower` bounds.
2020-06-11 03:00:50 +08:00
The `upper` and `lower` bounds are calculated as population metrics so they are always the same as `upper_population` and
`lower_population` respectively.
2015-01-23 03:01:15 +08:00
.Standard Deviation and Bounds require normality
[NOTE]
=====
2021-03-31 21:57:47 +08:00
The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
2015-01-23 03:01:15 +08:00
if your data is skewed heavily left or right, the value returned will be misleading.
=====
2013-11-24 19:13:08 +08:00
==== Script
Computing the grades stats based on a script:
2019-09-05 00:51:02 +08:00
[source,console]
2013-11-24 19:13:08 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2013-11-24 19:13:08 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": {
"extended_stats": {
"script": {
"source": "doc['grade'].value",
"lang": "painless"
}
}
2013-11-24 19:13:08 +08:00
}
2020-07-21 03:08:04 +08:00
}
2013-11-24 19:13:08 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams]
2013-11-24 19:13:08 +08:00
2017-09-12 02:39:29 +08:00
This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a stored script use the following syntax:
2015-05-12 17:37:22 +08:00
2019-09-05 00:51:02 +08:00
[source,console]
2015-05-12 17:37:22 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2015-05-12 17:37:22 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": {
"extended_stats": {
"script": {
"id": "my_script",
"params": {
"field": "grade"
}
2015-05-12 17:37:22 +08:00
}
2020-07-21 03:08:04 +08:00
}
2015-05-12 17:37:22 +08:00
}
2020-07-21 03:08:04 +08:00
}
2015-05-12 17:37:22 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams,stored_example_script]
2015-05-12 17:37:22 +08:00
2013-11-24 19:13:08 +08:00
===== Value Script
2014-01-18 00:20:05 +08:00
It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
2013-11-24 19:13:08 +08:00
2019-09-05 00:51:02 +08:00
[source,console]
2013-11-24 19:13:08 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2013-11-24 19:13:08 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": {
"extended_stats": {
"field": "grade",
"script": {
"lang": "painless",
"source": "_value * params.correction",
"params": {
"correction": 1.2
}
2013-11-24 19:13:08 +08:00
}
2020-07-21 03:08:04 +08:00
}
2013-11-24 19:13:08 +08:00
}
2020-07-21 03:08:04 +08:00
}
2013-11-24 19:13:08 +08:00
}
2015-05-07 22:46:40 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams]
2015-05-07 22:46:40 +08:00
==== Missing value
The `missing` parameter defines how documents that are missing a value should be treated.
By default they will be ignored but it is also possible to treat them as if they
had a value.
2019-09-05 00:51:02 +08:00
[source,console]
2015-05-07 22:46:40 +08:00
--------------------------------------------------
2017-08-03 04:13:30 +08:00
GET /exams/_search
2015-05-07 22:46:40 +08:00
{
2020-07-21 03:08:04 +08:00
"size": 0,
"aggs": {
"grades_stats": {
"extended_stats": {
"field": "grade",
"missing": 0 <1>
}
2015-05-07 22:46:40 +08:00
}
2020-07-21 03:08:04 +08:00
}
2015-05-07 22:46:40 +08:00
}
--------------------------------------------------
2017-08-03 04:13:30 +08:00
// TEST[setup:exams]
2015-05-07 22:46:40 +08:00
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.