2018-07-24 06:33:15 +08:00
|
|
|
[[search-aggregations-metrics-weight-avg-aggregation]]
|
2020-10-31 01:25:21 +08:00
|
|
|
=== Weighted avg aggregation
|
|
|
|
++++
|
|
|
|
<titleabbrev>Weighted avg</titleabbrev>
|
|
|
|
++++
|
2018-07-24 06:33:15 +08:00
|
|
|
|
|
|
|
A `single-value` metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents.
|
2021-04-06 01:08:13 +08:00
|
|
|
These values can be extracted either from specific numeric fields in the documents.
|
2018-07-24 06:33:15 +08:00
|
|
|
|
2021-03-31 21:57:47 +08:00
|
|
|
When calculating a regular average, each datapoint has an equal "weight" ... it contributes equally to the final value. Weighted averages,
|
|
|
|
on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the
|
2021-04-06 01:08:13 +08:00
|
|
|
document.
|
2018-07-24 06:33:15 +08:00
|
|
|
|
|
|
|
As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`
|
|
|
|
|
|
|
|
A regular average can be thought of as a weighted average where every value has an implicit weight of `1`.
|
|
|
|
|
2019-04-30 22:19:09 +08:00
|
|
|
[[weighted-avg-params]]
|
2018-07-24 06:33:15 +08:00
|
|
|
.`weighted_avg` Parameters
|
2019-04-30 22:19:09 +08:00
|
|
|
[options="header"]
|
2018-07-24 06:33:15 +08:00
|
|
|
|===
|
|
|
|
|Parameter Name |Description |Required |Default Value
|
|
|
|
|`value` | The configuration for the field or script that provides the values |Required |
|
|
|
|
|`weight` | The configuration for the field or script that provides the weights |Required |
|
|
|
|
|`format` | The numeric response formatter |Optional |
|
|
|
|
|===
|
|
|
|
|
|
|
|
The `value` and `weight` objects have per-field specific configuration:
|
|
|
|
|
2019-04-30 22:19:09 +08:00
|
|
|
[[value-params]]
|
2018-07-24 06:33:15 +08:00
|
|
|
.`value` Parameters
|
2019-04-30 22:19:09 +08:00
|
|
|
[options="header"]
|
2018-07-24 06:33:15 +08:00
|
|
|
|===
|
|
|
|
|Parameter Name |Description |Required |Default Value
|
|
|
|
|`field` | The field that values should be extracted from |Required |
|
|
|
|
|`missing` | A value to use if the field is missing entirely |Optional |
|
|
|
|
|===
|
|
|
|
|
2019-04-30 22:19:09 +08:00
|
|
|
[[weight-params]]
|
2018-07-24 06:33:15 +08:00
|
|
|
.`weight` Parameters
|
2019-04-30 22:19:09 +08:00
|
|
|
[options="header"]
|
2018-07-24 06:33:15 +08:00
|
|
|
|===
|
|
|
|
|Parameter Name |Description |Required |Default Value
|
|
|
|
|`field` | The field that weights should be extracted from |Required |
|
|
|
|
|`missing` | A weight to use if the field is missing entirely |Optional |
|
|
|
|
|===
|
|
|
|
|
|
|
|
|
|
|
|
==== Examples
|
|
|
|
|
|
|
|
If our documents have a `"grade"` field that holds a 0-100 numeric score, and a `"weight"` field which holds an arbitrary numeric weight,
|
|
|
|
we can calculate the weighted average using:
|
|
|
|
|
2019-09-05 00:51:02 +08:00
|
|
|
[source,console]
|
2018-07-24 06:33:15 +08:00
|
|
|
--------------------------------------------------
|
|
|
|
POST /exams/_search
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
"size": 0,
|
|
|
|
"aggs": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"weighted_avg": {
|
|
|
|
"value": {
|
|
|
|
"field": "grade"
|
|
|
|
},
|
|
|
|
"weight": {
|
|
|
|
"field": "weight"
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[setup:exams]
|
|
|
|
|
|
|
|
Which yields a response like:
|
|
|
|
|
2019-09-07 02:05:36 +08:00
|
|
|
[source,console-result]
|
2018-07-24 06:33:15 +08:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
...
|
|
|
|
"aggregations": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"value": 70.0
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
|
|
|
|
|
2021-03-31 21:57:47 +08:00
|
|
|
While multiple values-per-field are allowed, only one weight is allowed. If the aggregation encounters
|
2021-04-06 01:08:13 +08:00
|
|
|
a document that has more than one weight (e.g. the weight field is a multi-valued field) it will abort the search.
|
|
|
|
If you have this situation, you should build a <<search-aggregations-metrics-weight-avg-aggregation-runtime-field>>
|
|
|
|
to combine those values into a single weight.
|
2018-07-24 06:33:15 +08:00
|
|
|
|
|
|
|
This single weight will be applied independently to each value extracted from the `value` field.
|
|
|
|
|
|
|
|
This example show how a single document with multiple values will be averaged with a single weight:
|
|
|
|
|
2019-09-05 00:51:02 +08:00
|
|
|
[source,console]
|
2018-07-24 06:33:15 +08:00
|
|
|
--------------------------------------------------
|
|
|
|
POST /exams/_doc?refresh
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
"grade": [1, 2, 3],
|
|
|
|
"weight": 2
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
POST /exams/_search
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
"size": 0,
|
|
|
|
"aggs": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"weighted_avg": {
|
|
|
|
"value": {
|
|
|
|
"field": "grade"
|
|
|
|
},
|
|
|
|
"weight": {
|
|
|
|
"field": "weight"
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST
|
|
|
|
|
|
|
|
The three values (`1`, `2`, and `3`) will be included as independent values, all with the weight of `2`:
|
|
|
|
|
2019-09-07 02:05:36 +08:00
|
|
|
[source,console-result]
|
2018-07-24 06:33:15 +08:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
...
|
|
|
|
"aggregations": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"value": 2.0
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
|
|
|
|
|
|
|
|
The aggregation returns `2.0` as the result, which matches what we would expect when calculating by hand:
|
|
|
|
`((1*2) + (2*2) + (3*2)) / (2+2+2) == 2`
|
|
|
|
|
2021-04-06 01:08:13 +08:00
|
|
|
[[search-aggregations-metrics-weight-avg-aggregation-runtime-field]]
|
|
|
|
==== Runtime field
|
2018-07-24 06:33:15 +08:00
|
|
|
|
2021-04-06 01:08:13 +08:00
|
|
|
If you have to sum or weigh values that don't quite line up with the indexed
|
|
|
|
values, run the aggregation on a <<runtime,runtime field>>.
|
2018-07-24 06:33:15 +08:00
|
|
|
|
2019-09-05 00:51:02 +08:00
|
|
|
[source,console]
|
2021-04-06 01:08:13 +08:00
|
|
|
----
|
|
|
|
POST /exams/_doc?refresh
|
|
|
|
{
|
|
|
|
"grade": 100,
|
|
|
|
"weight": [2, 3]
|
|
|
|
}
|
|
|
|
POST /exams/_doc?refresh
|
|
|
|
{
|
|
|
|
"grade": 80,
|
|
|
|
"weight": 3
|
|
|
|
}
|
|
|
|
|
|
|
|
POST /exams/_search?filter_path=aggregations
|
2018-07-24 06:33:15 +08:00
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
"size": 0,
|
2021-04-06 01:08:13 +08:00
|
|
|
"runtime_mappings": {
|
|
|
|
"weight.combined": {
|
|
|
|
"type": "double",
|
|
|
|
"script": """
|
|
|
|
double s = 0;
|
|
|
|
for (double w : doc['weight']) {
|
|
|
|
s += w;
|
|
|
|
}
|
|
|
|
emit(s);
|
|
|
|
"""
|
|
|
|
}
|
|
|
|
},
|
2020-07-21 03:08:04 +08:00
|
|
|
"aggs": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"weighted_avg": {
|
|
|
|
"value": {
|
|
|
|
"script": "doc.grade.value + 1"
|
|
|
|
},
|
|
|
|
"weight": {
|
2021-04-06 01:08:13 +08:00
|
|
|
"field": "weight.combined"
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2021-04-06 01:08:13 +08:00
|
|
|
----
|
|
|
|
|
|
|
|
Which should look like:
|
|
|
|
|
|
|
|
[source,console-result]
|
|
|
|
----
|
|
|
|
{
|
|
|
|
"aggregations": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"value": 93.5
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----
|
2018-07-24 06:33:15 +08:00
|
|
|
|
|
|
|
|
|
|
|
==== Missing values
|
|
|
|
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
|
|
The default behavior is different for `value` and `weight`:
|
|
|
|
|
|
|
|
By default, if the `value` field is missing the document is ignored and the aggregation moves on to the next document.
|
|
|
|
If the `weight` field is missing, it is assumed to have a weight of `1` (like a normal average).
|
|
|
|
|
|
|
|
Both of these defaults can be overridden with the `missing` parameter:
|
|
|
|
|
2019-09-05 00:51:02 +08:00
|
|
|
[source,console]
|
2018-07-24 06:33:15 +08:00
|
|
|
--------------------------------------------------
|
|
|
|
POST /exams/_search
|
|
|
|
{
|
2020-07-21 03:08:04 +08:00
|
|
|
"size": 0,
|
|
|
|
"aggs": {
|
|
|
|
"weighted_grade": {
|
|
|
|
"weighted_avg": {
|
|
|
|
"value": {
|
|
|
|
"field": "grade",
|
|
|
|
"missing": 2
|
|
|
|
},
|
|
|
|
"weight": {
|
|
|
|
"field": "weight",
|
|
|
|
"missing": 3
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
2020-07-21 03:08:04 +08:00
|
|
|
}
|
2018-07-24 06:33:15 +08:00
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[setup:exams]
|