2023-10-17 23:36:14 +08:00
|
|
|
[discrete]
|
2023-06-06 00:38:55 +08:00
|
|
|
[[esql-stats-by]]
|
|
|
|
=== `STATS ... BY`
|
2023-11-06 15:42:13 +08:00
|
|
|
|
|
|
|
**Syntax**
|
|
|
|
|
|
|
|
[source,esql]
|
|
|
|
----
|
|
|
|
STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_column1[, ..., grouping_columnN]]
|
|
|
|
----
|
|
|
|
|
|
|
|
*Parameters*
|
|
|
|
|
|
|
|
`columnX`::
|
|
|
|
The name by which the aggregated value is returned. If omitted, the name is
|
|
|
|
equal to the corresponding expression (`expressionX`).
|
|
|
|
|
|
|
|
`expressionX`::
|
|
|
|
An expression that computes an aggregated value.
|
|
|
|
|
|
|
|
`grouping_columnX`::
|
|
|
|
The column containing the values to group by.
|
|
|
|
|
|
|
|
*Description*
|
|
|
|
|
|
|
|
The `STATS ... BY` processing command groups rows according to a common value
|
|
|
|
and calculate one or more aggregated values over the grouped rows. If `BY` is
|
|
|
|
omitted, the output table contains exactly one row with the aggregations applied
|
|
|
|
over the entire dataset.
|
|
|
|
|
|
|
|
The following aggregation functions are supported:
|
|
|
|
|
|
|
|
include::../functions/aggregation-functions.asciidoc[tag=agg_list]
|
|
|
|
|
|
|
|
NOTE: `STATS` without any groups is much much faster than adding a group.
|
|
|
|
|
|
|
|
NOTE: Grouping on a single column is currently much more optimized than grouping
|
|
|
|
on many columns. In some tests we have seen grouping on a single `keyword`
|
|
|
|
column to be five times faster than grouping on two `keyword` columns. Do
|
|
|
|
not try to work around this by combining the two columns together with
|
|
|
|
something like <<esql-concat>> and then grouping - that is not going to be
|
|
|
|
faster.
|
|
|
|
|
|
|
|
*Examples*
|
|
|
|
|
|
|
|
Calculating a statistic and grouping by the values of another column:
|
2023-06-06 00:38:55 +08:00
|
|
|
|
2023-06-12 22:37:45 +08:00
|
|
|
[source.merge.styled,esql]
|
2023-06-06 00:38:55 +08:00
|
|
|
----
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=stats]
|
|
|
|
----
|
2023-06-12 22:37:45 +08:00
|
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
2023-06-06 00:38:55 +08:00
|
|
|
|===
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=stats-result]
|
|
|
|
|===
|
|
|
|
|
2023-11-06 15:42:13 +08:00
|
|
|
Omitting `BY` returns one row with the aggregations applied over the entire
|
|
|
|
dataset:
|
2023-06-06 00:38:55 +08:00
|
|
|
|
2023-06-12 22:37:45 +08:00
|
|
|
[source.merge.styled,esql]
|
2023-06-06 00:38:55 +08:00
|
|
|
----
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=statsWithoutBy]
|
|
|
|
----
|
2023-06-12 22:37:45 +08:00
|
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
2023-06-06 00:38:55 +08:00
|
|
|
|===
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=statsWithoutBy-result]
|
|
|
|
|===
|
|
|
|
|
|
|
|
It's possible to calculate multiple values:
|
|
|
|
|
|
|
|
[source,esql]
|
|
|
|
----
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=statsCalcMultipleValues]
|
|
|
|
----
|
|
|
|
|
|
|
|
It's also possible to group by multiple values (only supported for long and
|
|
|
|
keyword family fields):
|
|
|
|
|
|
|
|
[source,esql]
|
|
|
|
----
|
|
|
|
include::{esql-specs}/docs.csv-spec[tag=statsGroupByMultipleValues]
|
|
|
|
----
|