[9.0] ESQL autogenerate docs v3 (#124312) (#124786)

Manual backport of https://github.com/elastic/elasticsearch/pull/124312
and https://github.com/elastic/elasticsearch/pull/124742
This commit is contained in:
Craig Taverner 2025-03-13 20:33:18 +01:00 committed by GitHub
parent ed93e24195
commit d3d9a00fb1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
879 changed files with 3433 additions and 18054 deletions

7
.gitattributes vendored
View File

@ -13,6 +13,9 @@ x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBasePar
x-pack/plugin/esql/src/main/generated/** linguist-generated=true x-pack/plugin/esql/src/main/generated/** linguist-generated=true
x-pack/plugin/esql/src/main/generated-src/** linguist-generated=true x-pack/plugin/esql/src/main/generated-src/** linguist-generated=true
# ESQL functions docs are autogenerated. More information at `docs/reference/esql/functions/README.md` # ESQL functions docs are autogenerated. More information at `docs/reference/query-languages/esql/README.md`
docs/reference/esql/functions/*/** linguist-generated=true docs/reference/query-languages/esql/_snippets/functions/*/** linguist-generated=true
#docs/reference/query-languages/esql/_snippets/operators/*/** linguist-generated=true
docs/reference/query-languages/esql/images/** linguist-generated=true
docs/reference/query-languages/esql/kibana/** linguist-generated=true

View File

@ -2,8 +2,8 @@ project: 'Elasticsearch'
exclude: exclude:
- README.md - README.md
- internal/* - internal/*
- reference/esql/functions/kibana/docs/* - reference/query-languages/esql/kibana/docs/**
- reference/esql/functions/README.md - reference/query-languages/esql/README.md
cross_links: cross_links:
- beats - beats
- cloud - cloud

View File

@ -0,0 +1,17 @@
* configurable precision, which decides on how to trade memory for accuracy,
* excellent accuracy on low-cardinality sets,
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
The following chart shows how the error varies before and after the threshold:
![cardinality error](/images/cardinality_error.png "")
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
hashes in a dataset can affect the accuracy of the cardinality.

View File

@ -1,60 +1,3 @@
## `PERCENTILE` [esql-percentile]
**Syntax**
:::{image} ../../../../../images/percentile.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the `MEDIAN`.
**Supported types**
| number | percentile | result |
| --- | --- | --- |
| double | double | double |
| double | integer | double |
| double | long | double |
| integer | double | double |
| integer | integer | double |
| integer | long | double |
| long | double | double |
| long | integer | double |
| long | long | double |
**Examples**
```esql
FROM employees
| STATS p0 = PERCENTILE(salary, 0)
, p50 = PERCENTILE(salary, 50)
, p99 = PERCENTILE(salary, 99)
```
| p0:double | p50:double | p99:double |
| --- | --- | --- |
| 25324 | 47003 | 74970.29 |
The expression can use inline functions. For example, to calculate a percentile of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `PERCENTILE` function
```esql
FROM employees
| STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80)
```
| p80_max_salary_change:double |
| --- |
| 12.132 |
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
@ -72,11 +15,3 @@ The following chart shows the relative error on a uniform distribution depending
![percentiles error](/images/percentiles_error.png "") ![percentiles error](/images/percentiles_error.png "")
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions. It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
::::{warning}
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
::::

View File

@ -65,19 +65,8 @@ Computing exact counts requires loading values into a hash set and returning its
This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties: This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
* configurable precision, which decides on how to trade memory for accuracy, :::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
* excellent accuracy on low-cardinality sets, :::
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
The following chart shows how the error varies before and after the threshold:
![cardinality error](../../../images/cardinality_error.png "")
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
## Pre-computed hashes [_pre_computed_hashes] ## Pre-computed hashes [_pre_computed_hashes]

View File

@ -175,31 +175,14 @@ GET latency/_search
## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation] ## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. :::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
:::
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
When using this metric, there are a few guidelines to keep in mind:
* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated
The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
![percentiles error](../../../images/percentiles_error.png "")
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
::::{warning} ::::{warning}
Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data. Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
:::: ::::
## Compression [search-aggregations-metrics-percentile-aggregation-compression] ## Compression [search-aggregations-metrics-percentile-aggregation-compression]
Approximate algorithms must balance memory utilization with estimation accuracy. This balance can be controlled using a `compression` parameter: Approximate algorithms must balance memory utilization with estimation accuracy. This balance can be controlled using a `compression` parameter:

View File

@ -1,23 +0,0 @@
The files in these subdirectories are generated by ESQL's test suite:
* `description` - description of each function scraped from `@FunctionInfo#description`
* `examples` - examples of each function scraped from `@FunctionInfo#examples`
* `parameters` - description of each function's parameters scraped from `@Param`
* `signature` - railroad diagram of the syntax to invoke each function
* `types` - a table of each combination of support type for each parameter. These are generated from tests.
* `layout` - a fully generated description for each function
* `kibana/definition` - function definitions for kibana's ESQL editor
* `kibana/docs` - the inline docs for kibana
Most functions can use the generated docs generated in the `layout` directory.
If we need something more custom for the function we can make a file in this
directory that can `include::` any parts of the files above.
To regenerate the files for a function run its tests using gradle:
```
./gradlew :x-pack:plugin:esql:test -Dtests.class='*SinTests'
```
To regenerate the files for all functions run all of ESQL's tests using gradle:
```
./gradlew :x-pack:plugin:esql:test
```

View File

@ -1,17 +0,0 @@
{
"comment" : "This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.",
"type" : "scalar",
"name" : "pi",
"description" : "Returns Pi, the ratio of a circle's circumference to its diameter.",
"signatures" : [
{
"params" : [ ],
"returnType" : "double"
}
],
"examples" : [
"ROW PI()"
],
"preview" : false,
"snapshot_only" : false
}

View File

@ -1,17 +0,0 @@
{
"comment" : "This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.",
"type" : "scalar",
"name" : "tau",
"description" : "Returns the ratio of a circle's circumference to its radius.",
"signatures" : [
{
"params" : [ ],
"returnType" : "double"
}
],
"examples" : [
"ROW TAU()"
],
"preview" : false,
"snapshot_only" : false
}

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### ABS
Returns the absolute value.
```
ROW number = -1.0
| EVAL abs_number = ABS(number)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### AVG
The average of a numeric field.
```
FROM employees
| STATS AVG(height)
```

View File

@ -1,12 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### CIDR_MATCH
Returns true if the provided IP is contained in one of the provided CIDR blocks.
```
FROM hosts
| WHERE CIDR_MATCH(ip1, "127.0.0.2/32", "127.0.0.3/32")
| KEEP card, host, ip0, ip1
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### COS
Returns the cosine of an angle.
```
ROW a=1.8
| EVAL cos=COS(a)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### COSH
Returns the hyperbolic cosine of a number.
```
ROW a=1.8
| EVAL cosh=COSH(a)
```

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### E
Returns Euler's number.
```
ROW E()
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### FROM_BASE64
Decode a base64 string.
```
row a = "ZWxhc3RpYw=="
| eval d = from_base64(a)
```

View File

@ -1,14 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### KQL
Performs a KQL query. Returns true if the provided KQL query string matches the row.
```
FROM books
| WHERE KQL("author: Faulkner")
| KEEP book_no, author
| SORT book_no
| LIMIT 5
```

View File

@ -1,14 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### LEFT
Returns the substring that extracts 'length' chars from 'string' starting from the left.
```
FROM employees
| KEEP last_name
| EVAL left = LEFT(last_name, 3)
| SORT last_name ASC
| LIMIT 5
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### MAX
The maximum value of a field.
```
FROM employees
| STATS MAX(languages)
```

View File

@ -1,13 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### MD5
Computes the MD5 hash of the input.
```
FROM sample_data
| WHERE message != "Connection error"
| EVAL md5 = md5(message)
| KEEP message, md5;
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### MIN
The minimum value of a field.
```
FROM employees
| STATS MIN(languages)
```

View File

@ -1,7 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### MV_APPEND
Concatenates values of two multi-value fields.

View File

@ -1,12 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### MV_DEDUPE
Remove duplicate values from a multivalued field.
```
ROW a=["foo", "foo", "bar", "foo"]
| EVAL dedupe_a = MV_DEDUPE(a)
```
Note: `MV_DEDUPE` may, but won't always, sort the values in the column.

View File

@ -1,7 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### NEG
Returns the negation of the argument.

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### NOT_RLIKE
Use `NOT RLIKE` to filter data based on string patterns using using
<<regexp-syntax,regular expressions>>. `NOT RLIKE` usually acts on a field placed on
the left-hand side of the operator, but it can also act on a constant (literal)
expression. The right-hand side of the operator represents the pattern.

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### NOW
Returns current date and time.
```
ROW current_date = NOW()
```

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### PI
Returns Pi, the ratio of a circle's circumference to its diameter.
```
ROW PI()
```

View File

@ -1,14 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### RIGHT
Return the substring that extracts 'length' chars from 'str' starting from the right.
```
FROM employees
| KEEP last_name
| EVAL right = RIGHT(last_name, 3)
| SORT last_name ASC
| LIMIT 5
```

View File

@ -1,13 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### SHA1
Computes the SHA1 hash of the input.
```
FROM sample_data
| WHERE message != "Connection error"
| EVAL sha1 = sha1(message)
| KEEP message, sha1;
```

View File

@ -1,13 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### SHA256
Computes the SHA256 hash of the input.
```
FROM sample_data
| WHERE message != "Connection error"
| EVAL sha256 = sha256(message)
| KEEP message, sha256;
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### SIN
Returns the sine of an angle.
```
ROW a=1.8
| EVAL sin=SIN(a)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### SINH
Returns the hyperbolic sine of a number.
```
ROW a=1.8
| EVAL sinh=SINH(a)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### STD_DEV
The standard deviation of a numeric field.
```
FROM employees
| STATS STD_DEV(height)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### SUM
The sum of a numeric expression.
```
FROM employees
| STATS SUM(languages)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TAN
Returns the tangent of an angle.
```
ROW a=1.8
| EVAL tan=TAN(a)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TANH
Returns the hyperbolic tangent of a number.
```
ROW a=1.8
| EVAL tanh=TANH(a)
```

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TAU
Returns the ratio of a circle's circumference to its radius.
```
ROW TAU()
```

View File

@ -1,13 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TERM
Performs a Term query on the specified field. Returns true if the provided term matches the row.
```
FROM books
| WHERE TERM(author, "gabriel")
| KEEP book_no, title
| LIMIT 3;
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TO_BASE64
Encode a string to a base64 string.
```
row a = "elastic"
| eval e = to_base64(a)
```

View File

@ -1,14 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TO_BOOLEAN
Converts an input value to a boolean value.
A string value of *true* will be case-insensitive converted to the Boolean *true*.
For anything else, including the empty string, the function will return *false*.
The numerical value of *0* will be converted to *false*, anything else will be converted to *true*.
```
ROW str = ["true", "TRuE", "false", "", "yes", "1"]
| EVAL bool = TO_BOOLEAN(str)
```

View File

@ -1,11 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TO_STRING
Converts an input value into a string.
```
ROW a=10
| EVAL j = TO_STRING(a)
```

View File

@ -1,10 +0,0 @@
<!--
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
-->
### TO_VERSION
Converts an input string to a version value.
```
ROW v = TO_VERSION("1.2.3")
```

View File

@ -1,21 +0,0 @@
{
"bool" : "to_boolean",
"boolean" : "to_boolean",
"cartesian_point" : "to_cartesianpoint",
"cartesian_shape" : "to_cartesianshape",
"date_nanos" : "to_date_nanos",
"date_period" : "to_dateperiod",
"datetime" : "to_datetime",
"double" : "to_double",
"geo_point" : "to_geopoint",
"geo_shape" : "to_geoshape",
"int" : "to_integer",
"integer" : "to_integer",
"ip" : "to_ip",
"keyword" : "to_string",
"long" : "to_long",
"string" : "to_string",
"time_duration" : "to_timeduration",
"unsigned_long" : "to_unsigned_long",
"version" : "to_version"
}

View File

@ -0,0 +1,50 @@
The ES|QL documentation is composed of static content and generated content.
The static content exists in this directory and can be edited by hand.
However, the sub-directories `_snippets`, `images` and `kibana` contain mostly
generated content.
### _snippets
In `_snippets` there are files that can be included within other files
using the [File Inclusion](https://elastic.github.io/docs-builder/syntax/file_inclusion/)
feature of the Elastic Docs V3 system.
Most, but not all, files in this directory are generated.
In particular the directories `_snippets/functions/*` and `_snippets/operators/*`
contain subdirectories that are mostly generated:
* `description` - description of each function scraped from `@FunctionInfo#description`
* `examples` - examples of each function scraped from `@FunctionInfo#examples`
* `parameters` - description of each function's parameters scraped from `@Param`
* `signature` - railroad diagram of the syntax to invoke each function
* `types` - a table of each combination of support type for each parameter. These are generated from tests.
* `layout` - a fully generated description for each function
Most functions can use the generated docs generated in the `layout` directory.
If we need something more custom for the function we can make a file in this
directory that can `include::` any parts of the files above.
To regenerate the files for a function run its tests using gradle.
For example to generate docs for the `CASE` function:
```
./gradlew :x-pack:plugin:esql:test -Dtests.class='CaseTests'
```
To regenerate the files for all functions run all of ESQL's tests using gradle:
```
./gradlew :x-pack:plugin:esql:test
```
### images
The `images` directory contains `functions` and `operators` sub-directories with
the `*.svg` files used to describe the syntax of each function or operator.
These are all generated by the same tests that generate the functions and operators docs above.
### kibana
The `kibana` directory contains `definition` and `docs` sub-directories that are generated:
* `kibana/definition` - function definitions for kibana's ESQL editor
* `kibana/docs` - the inline docs for kibana
These are also generated as part of the unit tests described above.

View File

@ -1,53 +0,0 @@
## {{esql}} aggregate functions [esql-agg-functions]
The [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command supports these aggregate functions:
:::{include} lists/aggregation-functions.md
:::
:::{include} functions/avg.md
:::
:::{include} functions/count.md
:::
:::{include} functions/count_distinct.md
:::
:::{include} functions/max.md
:::
:::{include} functions/median.md
:::
:::{include} functions/median_absolute_deviation.md
:::
:::{include} functions/min.md
:::
:::{include} functions/percentile.md
:::
:::{include} functions/st_centroid_agg.md
:::
:::{include} functions/st_extent_agg.md
:::
:::{include} functions/std_dev.md
:::
:::{include} functions/sum.md
:::
:::{include} functions/top.md
:::
:::{include} functions/values.md
:::
:::{include} functions/weighted_avg.md
:::

View File

@ -1,930 +0,0 @@
## {{esql}} aggregate functions [esql-agg-functions]
The [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command supports these aggregate functions:
:::{include} lists/aggregation-functions.md
:::
## `AVG` [esql-avg]
**Syntax**
:::{image} ../../../../images/avg.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The average of a numeric field.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
**Examples**
```esql
FROM employees
| STATS AVG(height)
```
| AVG(height):double |
| --- |
| 1.7682 |
The expression can use inline functions. For example, to calculate the average over a multivalued column, first use `MV_AVG` to average the multiple values per row, and use the result with the `AVG` function
```esql
FROM employees
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
```
| avg_salary_change:double |
| --- |
| 1.3904535865 |
## `COUNT` [esql-count]
**Syntax**
:::{image} ../../../../images/count.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Expression that outputs values to be counted. If omitted, equivalent to `COUNT(*)` (the number of rows).
**Description**
Returns the total number (count) of input values.
**Supported types**
| field | result |
| --- | --- |
| boolean | long |
| cartesian_point | long |
| date | long |
| double | long |
| geo_point | long |
| integer | long |
| ip | long |
| keyword | long |
| long | long |
| text | long |
| unsigned_long | long |
| version | long |
**Examples**
```esql
FROM employees
| STATS COUNT(height)
```
| COUNT(height):long |
| --- |
| 100 |
To count the number of rows, use `COUNT()` or `COUNT(*)`
```esql
FROM employees
| STATS count = COUNT(*) BY languages
| SORT languages DESC
```
| count:long | languages:integer |
| --- | --- |
| 10 | null |
| 21 | 5 |
| 18 | 4 |
| 17 | 3 |
| 19 | 2 |
| 15 | 1 |
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the values
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS word_count = COUNT(SPLIT(words, ";"))
```
| word_count:long |
| --- |
| 6 |
To count the number of times an expression returns `TRUE` use a [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) command to remove rows that shouldnt be included
```esql
ROW n=1
| WHERE n < 0
| STATS COUNT(n)
```
| COUNT(n):long |
| --- |
| 0 |
To count the same stream of data based on two different expressions use the pattern `COUNT(<expression> OR NULL)`. This builds on the three-valued logic ({{wikipedia}}/Three-valued_logic[3VL]) of the language: `TRUE OR NULL` is `TRUE`, but `FALSE OR NULL` is `NULL`, plus the way COUNT handles `NULL`s: `COUNT(TRUE)` and `COUNT(FALSE)` are both 1, but `COUNT(NULL)` is 0.
```esql
ROW n=1
| STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL)
```
| COUNT(n > 0 OR NULL):long | COUNT(n < 0 OR NULL):long |
| --- | --- |
| 1 | 0 |
## `COUNT_DISTINCT` [esql-count_distinct]
**Syntax**
:::{image} ../../../../images/count_distinct.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Column or literal for which to count the number of distinct values.
`precision`
: Precision threshold. Refer to [Counts are approximate](../esql-functions-operators.md#esql-agg-count-distinct-approximate). The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
**Description**
Returns the approximate number of distinct values.
**Supported types**
| field | precision | result |
| --- | --- | --- |
| boolean | integer | long |
| boolean | long | long |
| boolean | unsigned_long | long |
| boolean | | long |
| date | integer | long |
| date | long | long |
| date | unsigned_long | long |
| date | | long |
| date_nanos | integer | long |
| date_nanos | long | long |
| date_nanos | unsigned_long | long |
| date_nanos | | long |
| double | integer | long |
| double | long | long |
| double | unsigned_long | long |
| double | | long |
| integer | integer | long |
| integer | long | long |
| integer | unsigned_long | long |
| integer | | long |
| ip | integer | long |
| ip | long | long |
| ip | unsigned_long | long |
| ip | | long |
| keyword | integer | long |
| keyword | long | long |
| keyword | unsigned_long | long |
| keyword | | long |
| long | integer | long |
| long | long | long |
| long | unsigned_long | long |
| long | | long |
| text | integer | long |
| text | long | long |
| text | unsigned_long | long |
| text | | long |
| version | integer | long |
| version | long | long |
| version | unsigned_long | long |
| version | | long |
**Examples**
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1)
```
| COUNT_DISTINCT(ip0):long | COUNT_DISTINCT(ip1):long |
| --- | --- |
| 7 | 8 |
With the optional second parameter to configure the precision threshold
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5)
```
| COUNT_DISTINCT(ip0, 80000):long | COUNT_DISTINCT(ip1, 5):long |
| --- | --- |
| 7 | 9 |
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";"))
```
| distinct_word_count:long |
| --- |
| 5 |
### Counts are approximate [esql-agg-count-distinct-approximate]
Computing exact counts requires loading values into a set and returning its size. This doesnt scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
This `COUNT_DISTINCT` function is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
* configurable precision, which decides on how to trade memory for accuracy,
* excellent accuracy on low-cardinality sets,
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
The following chart shows how the error varies before and after the threshold:
![cardinality error](/images/cardinality_error.png "")
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is `3000`.
## `MAX` [esql-max]
**Syntax**
:::{image} ../../../../images/max.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The maximum value of a field.
**Supported types**
| field | result |
| --- | --- |
| boolean | boolean |
| date | date |
| date_nanos | date_nanos |
| double | double |
| integer | integer |
| ip | ip |
| keyword | keyword |
| long | long |
| text | keyword |
| version | version |
**Examples**
```esql
FROM employees
| STATS MAX(languages)
```
| MAX(languages):integer |
| --- |
| 5 |
The expression can use inline functions. For example, to calculate the maximum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MAX` function
```esql
FROM employees
| STATS max_avg_salary_change = MAX(MV_AVG(salary_change))
```
| max_avg_salary_change:double |
| --- |
| 13.75 |
## `MEDIAN` [esql-median]
**Syntax**
:::{image} ../../../../images/median.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](../esql-functions-operators.md#esql-percentile).
::::{note}
Like [`PERCENTILE`](../esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](../esql-functions-operators.md#esql-percentile-approximate).
::::
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
**Examples**
```esql
FROM employees
| STATS MEDIAN(salary), PERCENTILE(salary, 50)
```
| MEDIAN(salary):double | PERCENTILE(salary, 50):double |
| --- | --- |
| 47003 | 47003 |
The expression can use inline functions. For example, to calculate the median of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN` function
```esql
FROM employees
| STATS median_max_salary_change = MEDIAN(MV_MAX(salary_change))
```
| median_max_salary_change:double |
| --- |
| 7.69 |
::::{warning}
`MEDIAN` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
::::
## `MEDIAN_ABSOLUTE_DEVIATION` [esql-median_absolute_deviation]
**Syntax**
:::{image} ../../../../images/median_absolute_deviation.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation. It is calculated as the median of each data points deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`.
::::{note}
Like [`PERCENTILE`](../esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](../esql-functions-operators.md#esql-percentile-approximate).
::::
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
**Examples**
```esql
FROM employees
| STATS MEDIAN(salary), MEDIAN_ABSOLUTE_DEVIATION(salary)
```
| MEDIAN(salary):double | MEDIAN_ABSOLUTE_DEVIATION(salary):double |
| --- | --- |
| 47003 | 10096.5 |
The expression can use inline functions. For example, to calculate the median absolute deviation of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN_ABSOLUTE_DEVIATION` function
```esql
FROM employees
| STATS m_a_d_max_salary_change = MEDIAN_ABSOLUTE_DEVIATION(MV_MAX(salary_change))
```
| m_a_d_max_salary_change:double |
| --- |
| 5.69 |
::::{warning}
`MEDIAN_ABSOLUTE_DEVIATION` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
::::
## `MIN` [esql-min]
**Syntax**
:::{image} ../../../../images/min.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The minimum value of a field.
**Supported types**
| field | result |
| --- | --- |
| boolean | boolean |
| date | date |
| date_nanos | date_nanos |
| double | double |
| integer | integer |
| ip | ip |
| keyword | keyword |
| long | long |
| text | keyword |
| version | version |
**Examples**
```esql
FROM employees
| STATS MIN(languages)
```
| MIN(languages):integer |
| --- |
| 1 |
The expression can use inline functions. For example, to calculate the minimum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MIN` function
```esql
FROM employees
| STATS min_avg_salary_change = MIN(MV_AVG(salary_change))
```
| min_avg_salary_change:double |
| --- |
| -8.46 |
## `PERCENTILE` [esql-percentile]
**Syntax**
:::{image} ../../../../images/percentile.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the `MEDIAN`.
**Supported types**
| number | percentile | result |
| --- | --- | --- |
| double | double | double |
| double | integer | double |
| double | long | double |
| integer | double | double |
| integer | integer | double |
| integer | long | double |
| long | double | double |
| long | integer | double |
| long | long | double |
**Examples**
```esql
FROM employees
| STATS p0 = PERCENTILE(salary, 0)
, p50 = PERCENTILE(salary, 50)
, p99 = PERCENTILE(salary, 99)
```
| p0:double | p50:double | p99:double |
| --- | --- | --- |
| 25324 | 47003 | 74970.29 |
The expression can use inline functions. For example, to calculate a percentile of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `PERCENTILE` function
```esql
FROM employees
| STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80)
```
| p80_max_salary_change:double |
| --- |
| 12.132 |
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
Clearly, the naive implementation does not scalethe sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
When using this metric, there are a few guidelines to keep in mind:
* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated
The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
![percentiles error](/images/percentiles_error.png "")
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
::::{warning}
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
::::
## `ST_CENTROID_AGG` [esql-st_centroid_agg]
**Syntax**
:::{image} ../../../../images/st_centroid_agg.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Calculate the spatial centroid over a field with spatial point geometry type.
**Supported types**
| field | result |
| --- | --- |
| cartesian_point | cartesian_point |
| geo_point | geo_point |
**Example**
```esql
FROM airports
| STATS centroid=ST_CENTROID_AGG(location)
```
| centroid:geo_point |
| --- |
| POINT(-0.030548143003023033 24.37553649504829) |
## `ST_EXTENT_AGG` [esql-st_extent_agg]
**Syntax**
:::{image} ../../../../images/st_extent_agg.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Calculate the spatial extent over a field with geometry type. Returns a bounding box for all values of the field.
**Supported types**
| field | result |
| --- | --- |
| cartesian_point | cartesian_shape |
| cartesian_shape | cartesian_shape |
| geo_point | geo_shape |
| geo_shape | geo_shape |
**Example**
```esql
FROM airports
| WHERE country == "India"
| STATS extent = ST_EXTENT_AGG(location)
```
| extent:geo_shape |
| --- |
| BBOX (70.77995480038226, 91.5882289968431, 33.9830909203738, 8.47650992218405) |
## `STD_DEV` [esql-std_dev]
**Syntax**
:::{image} ../../../../images/std_dev.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The standard deviation of a numeric field.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
**Examples**
```esql
FROM employees
| STATS STD_DEV(height)
```
| STD_DEV(height):double |
| --- |
| 0.20637044362020449 |
The expression can use inline functions. For example, to calculate the standard deviation of each employees maximum salary changes, first use `MV_MAX` on each row, and then use `STD_DEV` on the result
```esql
FROM employees
| STATS stddev_salary_change = STD_DEV(MV_MAX(salary_change))
```
| stddev_salary_change:double |
| --- |
| 6.875829592924112 |
## `SUM` [esql-sum]
**Syntax**
:::{image} ../../../../images/sum.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The sum of a numeric expression.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | long |
| long | long |
**Examples**
```esql
FROM employees
| STATS SUM(languages)
```
| SUM(languages):long |
| --- |
| 281 |
The expression can use inline functions. For example, to calculate the sum of each employees maximum salary changes, apply the `MV_MAX` function to each row and then sum the results
```esql
FROM employees
| STATS total_salary_changes = SUM(MV_MAX(salary_change))
```
| total_salary_changes:double |
| --- |
| 446.75 |
## `TOP` [esql-top]
**Syntax**
:::{image} ../../../../images/top.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: The field to collect the top values for.
`limit`
: The maximum number of values to collect.
`order`
: The order to calculate the top values. Either `asc` or `desc`.
**Description**
Collects the top values for a field. Includes repeated values.
**Supported types**
| field | limit | order | result |
| --- | --- | --- | --- |
| boolean | integer | keyword | boolean |
| date | integer | keyword | date |
| double | integer | keyword | double |
| integer | integer | keyword | integer |
| ip | integer | keyword | ip |
| keyword | integer | keyword | keyword |
| long | integer | keyword | long |
| text | integer | keyword | keyword |
**Example**
```esql
FROM employees
| STATS top_salaries = TOP(salary, 3, "desc"), top_salary = MAX(salary)
```
| top_salaries:integer | top_salary:integer |
| --- | --- |
| [74999, 74970, 74572] | 74999 |
## `VALUES` [esql-values]
::::{warning}
Do not use on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
::::
**Syntax**
:::{image} ../../../../images/values.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
Returns all values in a group as a multivalued field. The order of the returned values isnt guaranteed. If you need the values returned in order use [`MV_SORT`](../esql-functions-operators.md#esql-mv_sort).
**Supported types**
| field | result |
| --- | --- |
| boolean | boolean |
| date | date |
| date_nanos | date_nanos |
| double | double |
| integer | integer |
| ip | ip |
| keyword | keyword |
| long | long |
| text | keyword |
| version | version |
**Example**
```esql
FROM employees
| EVAL first_letter = SUBSTRING(first_name, 0, 1)
| STATS first_name=MV_SORT(VALUES(first_name)) BY first_letter
| SORT first_letter
```
| first_name:keyword | first_letter:keyword |
| --- | --- |
| [Alejandro, Amabile, Anneke, Anoosh, Arumugam] | A |
| [Basil, Berhard, Berni, Bezalel, Bojan, Breannda, Brendon] | B |
| [Charlene, Chirstian, Claudi, Cristinel] | C |
| [Danel, Divier, Domenick, Duangkaew] | D |
| [Ebbe, Eberhardt, Erez] | E |
| Florian | F |
| [Gao, Georgi, Georgy, Gino, Guoxiang] | G |
| [Heping, Hidefumi, Hilari, Hironobu, Hironoby, Hisao] | H |
| [Jayson, Jungsoon] | J |
| [Kazuhide, Kazuhito, Kendra, Kenroku, Kshitij, Kwee, Kyoichi] | K |
| [Lillian, Lucien] | L |
| [Magy, Margareta, Mary, Mayuko, Mayumi, Mingsen, Mokhtar, Mona, Moss] | M |
| Otmar | O |
| [Parto, Parviz, Patricio, Prasadram, Premal] | P |
| [Ramzi, Remzi, Reuven] | R |
| [Sailaja, Saniya, Sanjiv, Satosi, Shahaf, Shir, Somnath, Sreekrishna, Sudharsan, Sumant, Suzette] | S |
| [Tse, Tuval, Tzvetan] | T |
| [Udi, Uri] | U |
| [Valdiodio, Valter, Vishv] | V |
| Weiyi | W |
| Xinglin | X |
| [Yinghua, Yishay, Yongqiao] | Y |
| [Zhongwei, Zvonko] | Z |
| null | null |
::::{warning}
This can use a significant amount of memory and ES|QL doesnt yet grow aggregations beyond memory. So this aggregation will work until it is used to collect more values than can fit into memory. Once it collects too many values it will fail the query with a [Circuit Breaker Error](docs-content://troubleshoot/elasticsearch/circuit-breaker-errors.md).
::::
## `WEIGHTED_AVG` [esql-weighted_avg]
**Syntax**
:::{image} ../../../../images/weighted_avg.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: A numeric value.
`weight`
: A numeric weight.
**Description**
The weighted average of a numeric expression.
**Supported types**
| number | weight | result |
| --- | --- | --- |
| double | double | double |
| double | integer | double |
| double | long | double |
| integer | double | double |
| integer | integer | double |
| integer | long | double |
| long | double | double |
| long | integer | double |
| long | long | double |
**Example**
```esql
FROM employees
| STATS w_avg = WEIGHTED_AVG(salary, height) by languages
| EVAL w_avg = ROUND(w_avg)
| KEEP w_avg, languages
| SORT languages
```
| w_avg:double | languages:integer |
| --- | --- |
| 51464.0 | 1 |
| 48477.0 | 2 |
| 52379.0 | 3 |
| 47990.0 | 4 |
| 42119.0 | 5 |
| 52142.0 | null |

View File

@ -1,21 +0,0 @@
## {{esql}} conditional functions and expressions [esql-conditional-functions-and-expressions]
Conditional functions return one of their arguments by evaluating in an if-else manner. {{esql}} supports these conditional functions:
:::{include} lists/conditional-functions-and-expressions.md
:::
:::{include} functions/case.md
:::
:::{include} functions/coalesce.md
:::
:::{include} functions/greatest.md
:::
:::{include} functions/least.md
:::

View File

@ -1,287 +0,0 @@
## {{esql}} conditional functions and expressions [esql-conditional-functions-and-expressions]
Conditional functions return one of their arguments by evaluating in an if-else manner. {{esql}} supports these conditional functions:
:::{include} lists/conditional-functions-and-expressions.md
:::
## `CASE` [esql-case]
**Syntax**
:::{image} ../../../../images/case.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`condition`
: A condition.
`trueValue`
: The value thats returned when the corresponding condition is the first to evaluate to `true`. The default value is returned when no condition matches.
`elseValue`
: The value thats returned when no condition evaluates to `true`.
**Description**
Accepts pairs of conditions and values. The function returns the value that belongs to the first condition that evaluates to `true`. If the number of arguments is odd, the last argument is the default value which is returned when no condition matches. If the number of arguments is even, and no condition matches, the function returns `null`.
**Supported types**
| condition | trueValue | elseValue | result |
| --- | --- | --- | --- |
| boolean | boolean | boolean | boolean |
| boolean | boolean | | boolean |
| boolean | cartesian_point | cartesian_point | cartesian_point |
| boolean | cartesian_point | | cartesian_point |
| boolean | cartesian_shape | cartesian_shape | cartesian_shape |
| boolean | cartesian_shape | | cartesian_shape |
| boolean | date | date | date |
| boolean | date | | date |
| boolean | date_nanos | date_nanos | date_nanos |
| boolean | date_nanos | | date_nanos |
| boolean | double | double | double |
| boolean | double | | double |
| boolean | geo_point | geo_point | geo_point |
| boolean | geo_point | | geo_point |
| boolean | geo_shape | geo_shape | geo_shape |
| boolean | geo_shape | | geo_shape |
| boolean | integer | integer | integer |
| boolean | integer | | integer |
| boolean | ip | ip | ip |
| boolean | ip | | ip |
| boolean | keyword | keyword | keyword |
| boolean | keyword | text | keyword |
| boolean | keyword | | keyword |
| boolean | long | long | long |
| boolean | long | | long |
| boolean | text | keyword | keyword |
| boolean | text | text | keyword |
| boolean | text | | keyword |
| boolean | unsigned_long | unsigned_long | unsigned_long |
| boolean | unsigned_long | | unsigned_long |
| boolean | version | version | version |
| boolean | version | | version |
**Examples**
Determine whether employees are monolingual, bilingual, or polyglot:
```esql
FROM employees
| EVAL type = CASE(
languages <= 1, "monolingual",
languages <= 2, "bilingual",
"polyglot")
| KEEP emp_no, languages, type
```
| emp_no:integer | languages:integer | type:keyword |
| --- | --- | --- |
| 10001 | 2 | bilingual |
| 10002 | 5 | polyglot |
| 10003 | 4 | polyglot |
| 10004 | 5 | polyglot |
| 10005 | 1 | monolingual |
Calculate the total connection success rate based on log messages:
```esql
FROM sample_data
| EVAL successful = CASE(
STARTS_WITH(message, "Connected to"), 1,
message == "Connection error", 0
)
| STATS success_rate = AVG(successful)
```
| success_rate:double |
| --- |
| 0.5 |
Calculate an hourly error rate as a percentage of the total number of log messages:
```esql
FROM sample_data
| EVAL error = CASE(message LIKE "*error*", 1, 0)
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
| STATS error_rate = AVG(error) by hour
| SORT hour
```
| error_rate:double | hour:date |
| --- | --- |
| 0.0 | 2023-10-23T12:00:00.000Z |
| 0.6 | 2023-10-23T13:00:00.000Z |
## `COALESCE` [esql-coalesce]
**Syntax**
:::{image} ../../../../images/coalesce.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`first`
: Expression to evaluate.
`rest`
: Other expression to evaluate.
**Description**
Returns the first of its arguments that is not null. If all arguments are null, it returns `null`.
**Supported types**
| first | rest | result |
| --- | --- | --- |
| boolean | boolean | boolean |
| boolean | | boolean |
| cartesian_point | cartesian_point | cartesian_point |
| cartesian_shape | cartesian_shape | cartesian_shape |
| date | date | date |
| date_nanos | date_nanos | date_nanos |
| geo_point | geo_point | geo_point |
| geo_shape | geo_shape | geo_shape |
| integer | integer | integer |
| integer | | integer |
| ip | ip | ip |
| keyword | keyword | keyword |
| keyword | | keyword |
| long | long | long |
| long | | long |
| text | text | keyword |
| text | | keyword |
| version | version | version |
**Example**
```esql
ROW a=null, b="b"
| EVAL COALESCE(a, b)
```
| a:null | b:keyword | COALESCE(a, b):keyword |
| --- | --- | --- |
| null | b | b |
## `GREATEST` [esql-greatest]
**Syntax**
:::{image} ../../../../images/greatest.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`first`
: First of the columns to evaluate.
`rest`
: The rest of the columns to evaluate.
**Description**
Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](../esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once.
::::{note}
When run on `keyword` or `text` fields, this returns the last string in alphabetical order. When run on `boolean` columns this will return `true` if any values are `true`.
::::
**Supported types**
| first | rest | result |
| --- | --- | --- |
| boolean | boolean | boolean |
| boolean | | boolean |
| date | date | date |
| date_nanos | date_nanos | date_nanos |
| double | double | double |
| integer | integer | integer |
| integer | | integer |
| ip | ip | ip |
| keyword | keyword | keyword |
| keyword | | keyword |
| long | long | long |
| long | | long |
| text | text | keyword |
| text | | keyword |
| version | version | version |
**Example**
```esql
ROW a = 10, b = 20
| EVAL g = GREATEST(a, b)
```
| a:integer | b:integer | g:integer |
| --- | --- | --- |
| 10 | 20 | 20 |
## `LEAST` [esql-least]
**Syntax**
:::{image} ../../../../images/least.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`first`
: First of the columns to evaluate.
`rest`
: The rest of the columns to evaluate.
**Description**
Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](../esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once.
**Supported types**
| first | rest | result |
| --- | --- | --- |
| boolean | boolean | boolean |
| boolean | | boolean |
| date | date | date |
| date_nanos | date_nanos | date_nanos |
| double | double | double |
| integer | integer | integer |
| integer | | integer |
| ip | ip | ip |
| keyword | keyword | keyword |
| keyword | | keyword |
| long | long | long |
| long | | long |
| text | text | keyword |
| text | | keyword |
| version | version | version |
**Example**
```esql
ROW a = 10, b = 20
| EVAL l = LEAST(a, b)
```
| a:integer | b:integer | l:integer |
| --- | --- | --- |
| 10 | 20 | 10 |

View File

@ -1,27 +0,0 @@
## {{esql}} date-time functions [esql-date-time-functions]
{{esql}} supports these date-time functions:
:::{include} lists/date-time-functions.md
:::
:::{include} functions/date_diff.md
:::
:::{include} functions/date_extract.md
:::
:::{include} functions/date_format.md
:::
:::{include} functions/date_parse.md
:::
:::{include} functions/date_trunc.md
:::
:::{include} functions/now.md
:::

View File

@ -1,359 +0,0 @@
## {{esql}} date-time functions [esql-date-time-functions]
{{esql}} supports these date-time functions:
:::{include} lists/date-time-functions.md
:::
## `DATE_DIFF` [esql-date_diff]
**Syntax**
:::{image} ../../../../images/date_diff.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`unit`
: Time difference unit
`startTimestamp`
: A string representing a start timestamp
`endTimestamp`
: A string representing an end timestamp
**Description**
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
| Datetime difference units |
| --- |
| **unit** | **abbreviations** |
| year | years, yy, yyyy |
| quarter | quarters, qq, q |
| month | months, mm, m |
| dayofyear | dy, y |
| day | days, dd, d |
| week | weeks, wk, ww |
| weekday | weekdays, dw |
| hour | hours, hh |
| minute | minutes, mi, n |
| second | seconds, ss, s |
| millisecond | milliseconds, ms |
| microsecond | microseconds, mcs |
| nanosecond | nanoseconds, ns |
Note that while there is an overlap between the functions supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}.
**Supported types**
| unit | startTimestamp | endTimestamp | result |
| --- | --- | --- | --- |
| keyword | date | date | integer |
| keyword | date | date_nanos | integer |
| keyword | date_nanos | date | integer |
| keyword | date_nanos | date_nanos | integer |
| text | date | date | integer |
| text | date | date_nanos | integer |
| text | date_nanos | date | integer |
| text | date_nanos | date_nanos | integer |
**Examples**
```esql
ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z")
| EVAL dd_ms = DATE_DIFF("microseconds", date1, date2)
```
| date1:date | date2:date | dd_ms:integer |
| --- | --- | --- |
| 2023-12-02T11:00:00.000Z | 2023-12-02T11:00:00.001Z | 1000 |
When subtracting in calendar units - like year, month a.s.o. - only the fully elapsed units are counted. To avoid this and obtain also remainders, simply switch to the next smaller unit and do the date math accordingly.
```esql
ROW end_23=TO_DATETIME("2023-12-31T23:59:59.999Z"),
start_24=TO_DATETIME("2024-01-01T00:00:00.000Z"),
end_24=TO_DATETIME("2024-12-31T23:59:59.999")
| EVAL end23_to_start24=DATE_DIFF("year", end_23, start_24)
| EVAL end23_to_end24=DATE_DIFF("year", end_23, end_24)
| EVAL start_to_end_24=DATE_DIFF("year", start_24, end_24)
```
| end_23:date | start_24:date | end_24:date | end23_to_start24:integer | end23_to_end24:integer | start_to_end_24:integer |
| --- | --- | --- | --- | --- | --- |
| 2023-12-31T23:59:59.999Z | 2024-01-01T00:00:00.000Z | 2024-12-31T23:59:59.999Z | 0 | 1 | 0 |
## `DATE_EXTRACT` [esql-date_extract]
**Syntax**
:::{image} ../../../../images/date_extract.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`datePart`
: Part of the date to extract. Can be: `aligned_day_of_week_in_month`, `aligned_day_of_week_in_year`, `aligned_week_of_month`, `aligned_week_of_year`, `ampm_of_day`, `clock_hour_of_ampm`, `clock_hour_of_day`, `day_of_month`, `day_of_week`, `day_of_year`, `epoch_day`, `era`, `hour_of_ampm`, `hour_of_day`, `instant_seconds`, `micro_of_day`, `micro_of_second`, `milli_of_day`, `milli_of_second`, `minute_of_day`, `minute_of_hour`, `month_of_year`, `nano_of_day`, `nano_of_second`, `offset_seconds`, `proleptic_month`, `second_of_day`, `second_of_minute`, `year`, or `year_of_era`. Refer to [java.time.temporal.ChronoField](https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.md) for a description of these values. If `null`, the function returns `null`.
`date`
: Date expression. If `null`, the function returns `null`.
**Description**
Extracts parts of a date, like year, month, day, hour.
**Supported types**
| datePart | date | result |
| --- | --- | --- |
| keyword | date | long |
| keyword | date_nanos | long |
| text | date | long |
| text | date_nanos | long |
**Examples**
```esql
ROW date = DATE_PARSE("yyyy-MM-dd", "2022-05-06")
| EVAL year = DATE_EXTRACT("year", date)
```
| date:date | year:long |
| --- | --- |
| 2022-05-06T00:00:00.000Z | 2022 |
Find all events that occurred outside of business hours (before 9 AM or after 5PM), on any given date:
```esql
FROM sample_data
| WHERE DATE_EXTRACT("hour_of_day", @timestamp) < 9 AND DATE_EXTRACT("hour_of_day", @timestamp) >= 17
```
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
| --- | --- | --- | --- |
## `DATE_FORMAT` [esql-date_format]
**Syntax**
:::{image} ../../../../images/date_format.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`dateFormat`
: Date format (optional). If no format is specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used. If `null`, the function returns `null`.
`date`
: Date expression. If `null`, the function returns `null`.
**Description**
Returns a string representation of a date, in the provided format.
**Supported types**
| dateFormat | date | result |
| --- | --- | --- |
| date | | keyword |
| date_nanos | | keyword |
| keyword | date | keyword |
| keyword | date_nanos | keyword |
| text | date | keyword |
| text | date_nanos | keyword |
**Example**
```esql
FROM employees
| KEEP first_name, last_name, hire_date
| EVAL hired = DATE_FORMAT("yyyy-MM-dd", hire_date)
```
| first_name:keyword | last_name:keyword | hire_date:date | hired:keyword |
| --- | --- | --- | --- |
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-06-26 |
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-11-18 |
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-06-02 |
## `DATE_PARSE` [esql-date_parse]
**Syntax**
:::{image} ../../../../images/date_parse.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`datePattern`
: The date format. Refer to the [`DateTimeFormatter` documentation](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/time/format/DateTimeFormatter.md) for the syntax. If `null`, the function returns `null`.
`dateString`
: Date expression as a string. If `null` or an empty string, the function returns `null`.
**Description**
Returns a date by parsing the second argument using the format specified in the first argument.
**Supported types**
| datePattern | dateString | result |
| --- | --- | --- |
| keyword | keyword | date |
| keyword | text | date |
| text | keyword | date |
| text | text | date |
**Example**
```esql
ROW date_string = "2022-05-06"
| EVAL date = DATE_PARSE("yyyy-MM-dd", date_string)
```
| date_string:keyword | date:date |
| --- | --- |
| 2022-05-06 | 2022-05-06T00:00:00.000Z |
## `DATE_TRUNC` [esql-date_trunc]
**Syntax**
:::{image} ../../../../images/date_trunc.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`interval`
: Interval; expressed using the timespan literal syntax.
`date`
: Date expression
**Description**
Rounds down a date to the closest interval.
**Supported types**
| interval | date | result |
| --- | --- | --- |
| date_period | date | date |
| date_period | date_nanos | date_nanos |
| time_duration | date | date |
| time_duration | date_nanos | date_nanos |
**Examples**
```esql
FROM employees
| KEEP first_name, last_name, hire_date
| EVAL year_hired = DATE_TRUNC(1 year, hire_date)
```
| first_name:keyword | last_name:keyword | hire_date:date | year_hired:date |
| --- | --- | --- | --- |
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-01-01T00:00:00.000Z |
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-01-01T00:00:00.000Z |
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-01-01T00:00:00.000Z |
Combine `DATE_TRUNC` with [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) to create date histograms. For example, the number of hires per year:
```esql
FROM employees
| EVAL year = DATE_TRUNC(1 year, hire_date)
| STATS hires = COUNT(emp_no) BY year
| SORT year
```
| hires:long | year:date |
| --- | --- |
| 11 | 1985-01-01T00:00:00.000Z |
| 11 | 1986-01-01T00:00:00.000Z |
| 15 | 1987-01-01T00:00:00.000Z |
| 9 | 1988-01-01T00:00:00.000Z |
| 13 | 1989-01-01T00:00:00.000Z |
| 12 | 1990-01-01T00:00:00.000Z |
| 6 | 1991-01-01T00:00:00.000Z |
| 8 | 1992-01-01T00:00:00.000Z |
| 3 | 1993-01-01T00:00:00.000Z |
| 4 | 1994-01-01T00:00:00.000Z |
| 5 | 1995-01-01T00:00:00.000Z |
| 1 | 1996-01-01T00:00:00.000Z |
| 1 | 1997-01-01T00:00:00.000Z |
| 1 | 1999-01-01T00:00:00.000Z |
Or an hourly error rate:
```esql
FROM sample_data
| EVAL error = CASE(message LIKE "*error*", 1, 0)
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
| STATS error_rate = AVG(error) by hour
| SORT hour
```
| error_rate:double | hour:date |
| --- | --- |
| 0.0 | 2023-10-23T12:00:00.000Z |
| 0.6 | 2023-10-23T13:00:00.000Z |
## `NOW` [esql-now]
**Syntax**
:::{image} ../../../../images/now.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
**Description**
Returns current date and time.
**Supported types**
| result |
| --- |
| date |
**Examples**
```esql
ROW current_date = NOW()
```
| y:keyword |
| --- |
| 20 |
To retrieve logs from the last hour:
```esql
FROM sample_data
| WHERE @timestamp > NOW() - 1 hour
```
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
| --- | --- | --- | --- |

View File

@ -1,51 +0,0 @@
## `ABS` [esql-abs]
**Syntax**
:::{image} ../../../../../images/abs.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Numeric expression. If `null`, the function returns `null`.
**Description**
Returns the absolute value.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | integer |
| long | long |
| unsigned_long | unsigned_long |
**Examples**
```esql
ROW number = -1.0
| EVAL abs_number = ABS(number)
```
| number:double | abs_number:double |
| --- | --- |
| -1.0 | 1.0 |
```esql
FROM employees
| KEEP first_name, last_name, height
| EVAL abs_height = ABS(0.0 - height)
```
| first_name:keyword | last_name:keyword | height:double | abs_height:double |
| --- | --- | --- | --- |
| Alejandro | McAlpine | 1.48 | 1.48 |
| Amabile | Gomatam | 2.09 | 2.09 |
| Anneke | Preusig | 1.56 | 1.56 |

View File

@ -1,39 +0,0 @@
## `ACOS` [esql-acos]
**Syntax**
:::{image} ../../../../../images/acos.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Number between -1 and 1. If `null`, the function returns `null`.
**Description**
Returns the [arccosine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of `n` as an angle, expressed in radians.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW a=.9
| EVAL acos=ACOS(a)
```
| a:double | acos:double |
| --- | --- |
| .9 | 0.45102681179626236 |

View File

@ -0,0 +1,24 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
### Counts are approximate [esql-agg-count-distinct-approximate]
Computing exact counts requires loading values into a set and returning its
size. This doesnt scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.
This `COUNT_DISTINCT` function is based on the
[HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf)
algorithm, which counts based on the hashes of the values with some interesting
properties:
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
:::
The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The `precision_threshold` options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is `40000`, thresholds above this number will have the
same effect as a threshold of `40000`. The default value is `3000`.

View File

@ -0,0 +1,6 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
::::{warning}
`MEDIAN` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
This means you can get slightly different results using the same data.
::::

View File

@ -0,0 +1,6 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
::::{warning}
`MEDIAN_ABSOLUTE_DEVIATION` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
This means you can get slightly different results using the same data.
::::

View File

@ -0,0 +1,11 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
:::
::::{warning}
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
This means you can get slightly different results using the same data.
::::

View File

@ -0,0 +1,9 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
::::{warning}
This can use a significant amount of memory and ES|QL doesnt yet
grow aggregations beyond memory. So this aggregation will work until
it is used to collect more values than can fit into memory. Once it
collects too many values it will fail the query with
a [Circuit Breaker Error](docs-content://troubleshoot/elasticsearch/circuit-breaker-errors.md).
::::

View File

@ -1,39 +0,0 @@
## `ASIN` [esql-asin]
**Syntax**
:::{image} ../../../../../images/asin.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Number between -1 and 1. If `null`, the function returns `null`.
**Description**
Returns the [arcsine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of the input numeric expression as an angle, expressed in radians.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW a=.9
| EVAL asin=ASIN(a)
```
| a:double | asin:double |
| --- | --- |
| .9 | 1.1197695149986342 |

View File

@ -1,39 +0,0 @@
## `ATAN` [esql-atan]
**Syntax**
:::{image} ../../../../../images/atan.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Numeric expression. If `null`, the function returns `null`.
**Description**
Returns the [arctangent](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of the input numeric expression as an angle, expressed in radians.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW a=12.9
| EVAL atan=ATAN(a)
```
| a:double | atan:double |
| --- | --- |
| 12.9 | 1.4934316673669235 |

View File

@ -1,54 +0,0 @@
## `ATAN2` [esql-atan2]
**Syntax**
:::{image} ../../../../../images/atan2.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`y_coordinate`
: y coordinate. If `null`, the function returns `null`.
`x_coordinate`
: x coordinate. If `null`, the function returns `null`.
**Description**
The [angle](https://en.wikipedia.org/wiki/Atan2) between the positive x-axis and the ray from the origin to the point (x , y) in the Cartesian plane, expressed in radians.
**Supported types**
| y_coordinate | x_coordinate | result |
| --- | --- | --- |
| double | double | double |
| double | integer | double |
| double | long | double |
| double | unsigned_long | double |
| integer | double | double |
| integer | integer | double |
| integer | long | double |
| integer | unsigned_long | double |
| long | double | double |
| long | integer | double |
| long | long | double |
| long | unsigned_long | double |
| unsigned_long | double | double |
| unsigned_long | integer | double |
| unsigned_long | long | double |
| unsigned_long | unsigned_long | double |
**Example**
```esql
ROW y=12.9, x=.6
| EVAL atan2=ATAN2(y, x)
```
| y:double | x:double | atan2:double |
| --- | --- | --- |
| 12.9 | 0.6 | 1.5243181954438936 |

View File

@ -1,47 +0,0 @@
## `AVG` [esql-avg]
**Syntax**
:::{image} ../../../../../images/avg.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
true
**Description**
The average of a numeric field.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
**Examples**
```esql
FROM employees
| STATS AVG(height)
```
| AVG(height):double |
| --- |
| 1.7682 |
The expression can use inline functions. For example, to calculate the average over a multivalued column, first use `MV_AVG` to average the multiple values per row, and use the result with the `AVG` function
```esql
FROM employees
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
```
| avg_salary_change:double |
| --- |
| 1.3904535865 |

View File

@ -1,46 +0,0 @@
## `BIT_LENGTH` [esql-bit_length]
**Syntax**
:::{image} ../../../../../images/bit_length.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`string`
: String expression. If `null`, the function returns `null`.
**Description**
Returns the bit length of a string.
::::{note}
All strings are in UTF-8, so a single character can use multiple bytes.
::::
**Supported types**
| string | result |
| --- | --- |
| keyword | integer |
| text | integer |
**Example**
```esql
FROM airports
| WHERE country == "India"
| KEEP city
| EVAL fn_length = LENGTH(city), fn_bit_length = BIT_LENGTH(city)
```
| city:keyword | fn_length:integer | fn_bit_length:integer |
| --- | --- | --- |
| Agwār | 5 | 48 |
| Ahmedabad | 9 | 72 |
| Bangalore | 9 | 72 |

View File

@ -1,296 +0,0 @@
## `BUCKET` [esql-bucket]
**Syntax**
:::{image} ../../../../../images/bucket.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Numeric or date expression from which to derive buckets.
`buckets`
: Target number of buckets, or desired bucket size if `from` and `to` parameters are omitted.
`from`
: Start of the range. Can be a number, a date or a date expressed as a string.
`to`
: End of the range. Can be a number, a date or a date expressed as a string.
**Description**
Creates groups of values - buckets - out of a datetime or numeric input. The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
**Supported types**
| field | buckets | from | to | result |
| --- | --- | --- | --- | --- |
| date | date_period | | | date |
| date | integer | date | date | date |
| date | integer | date | keyword | date |
| date | integer | date | text | date |
| date | integer | keyword | date | date |
| date | integer | keyword | keyword | date |
| date | integer | keyword | text | date |
| date | integer | text | date | date |
| date | integer | text | keyword | date |
| date | integer | text | text | date |
| date | time_duration | | | date |
| date_nanos | date_period | | | date_nanos |
| date_nanos | integer | date | date | date_nanos |
| date_nanos | integer | date | keyword | date_nanos |
| date_nanos | integer | date | text | date_nanos |
| date_nanos | integer | keyword | date | date_nanos |
| date_nanos | integer | keyword | keyword | date_nanos |
| date_nanos | integer | keyword | text | date_nanos |
| date_nanos | integer | text | date | date_nanos |
| date_nanos | integer | text | keyword | date_nanos |
| date_nanos | integer | text | text | date_nanos |
| date_nanos | time_duration | | | date_nanos |
| double | double | | | double |
| double | integer | double | double | double |
| double | integer | double | integer | double |
| double | integer | double | long | double |
| double | integer | integer | double | double |
| double | integer | integer | integer | double |
| double | integer | integer | long | double |
| double | integer | long | double | double |
| double | integer | long | integer | double |
| double | integer | long | long | double |
| double | integer | | | double |
| double | long | | | double |
| integer | double | | | double |
| integer | integer | double | double | double |
| integer | integer | double | integer | double |
| integer | integer | double | long | double |
| integer | integer | integer | double | double |
| integer | integer | integer | integer | double |
| integer | integer | integer | long | double |
| integer | integer | long | double | double |
| integer | integer | long | integer | double |
| integer | integer | long | long | double |
| integer | integer | | | double |
| integer | long | | | double |
| long | double | | | double |
| long | integer | double | double | double |
| long | integer | double | integer | double |
| long | integer | double | long | double |
| long | integer | integer | double | double |
| long | integer | integer | integer | double |
| long | integer | integer | long | double |
| long | integer | long | double | double |
| long | integer | long | integer | double |
| long | integer | long | long | double |
| long | integer | | | double |
| long | long | | | double |
**Examples**
`BUCKET` can work in two modes: one in which the size of the bucket is computed based on a buckets count recommendation (four parameters) and a range, and another in which the bucket size is provided directly (two parameters).
Using a target number of buckets, a start of a range, and an end of a range, `BUCKET` picks an appropriate bucket size to generate the target number of buckets or fewer. For example, asking for at most 20 buckets over a year results in monthly buckets:
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT hire_date
```
| hire_date:date | month:date |
| --- | --- |
| [1985-02-18T00:00:00.000Z, 1985-02-24T00:00:00.000Z] | 1985-02-01T00:00:00.000Z |
| 1985-05-13T00:00:00.000Z | 1985-05-01T00:00:00.000Z |
| 1985-07-09T00:00:00.000Z | 1985-07-01T00:00:00.000Z |
| 1985-09-17T00:00:00.000Z | 1985-09-01T00:00:00.000Z |
| [1985-10-14T00:00:00.000Z, 1985-10-20T00:00:00.000Z] | 1985-10-01T00:00:00.000Z |
| [1985-11-19T00:00:00.000Z, 1985-11-20T00:00:00.000Z, 1985-11-21T00:00:00.000Z] | 1985-11-01T00:00:00.000Z |
The goal isnt to provide **exactly** the target number of buckets, its to pick a range that people are comfortable with that provides at most the target number of buckets.
Combine `BUCKET` with an [aggregation](../../esql-functions-operators.md#esql-agg-functions) to create a histogram:
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT month
```
| hires_per_month:long | month:date |
| --- | --- |
| 2 | 1985-02-01T00:00:00.000Z |
| 1 | 1985-05-01T00:00:00.000Z |
| 1 | 1985-07-01T00:00:00.000Z |
| 1 | 1985-09-01T00:00:00.000Z |
| 2 | 1985-10-01T00:00:00.000Z |
| 4 | 1985-11-01T00:00:00.000Z |
::::{note}
`BUCKET` does not create buckets that dont match any documents. Thats why this example is missing `1985-03-01` and other dates.
::::
Asking for more buckets can result in a smaller range. For example, asking for at most 100 buckets in a year results in weekly buckets:
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT week
```
| hires_per_week:long | week:date |
| --- | --- |
| 2 | 1985-02-18T00:00:00.000Z |
| 1 | 1985-05-13T00:00:00.000Z |
| 1 | 1985-07-08T00:00:00.000Z |
| 1 | 1985-09-16T00:00:00.000Z |
| 2 | 1985-10-14T00:00:00.000Z |
| 4 | 1985-11-18T00:00:00.000Z |
::::{note}
`BUCKET` does not filter any rows. It only uses the provided range to pick a good bucket size. For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range. Combine`BUCKET` with [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) to filter rows.
::::
If the desired bucket size is known in advance, simply provide it as the second argument, leaving the range out:
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)
| SORT week
```
| hires_per_week:long | week:date |
| --- | --- |
| 2 | 1985-02-18T00:00:00.000Z |
| 1 | 1985-05-13T00:00:00.000Z |
| 1 | 1985-07-08T00:00:00.000Z |
| 1 | 1985-09-16T00:00:00.000Z |
| 2 | 1985-10-14T00:00:00.000Z |
| 4 | 1985-11-18T00:00:00.000Z |
::::{note}
When providing the bucket size as the second parameter, it must be a time duration or date period.
::::
`BUCKET` can also operate on numeric fields. For example, to create a salary histogram:
```esql
FROM employees
| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)
| SORT bs
```
| COUNT(*):long | bs:double |
| --- | --- |
| 9 | 25000.0 |
| 9 | 30000.0 |
| 18 | 35000.0 |
| 11 | 40000.0 |
| 11 | 45000.0 |
| 10 | 50000.0 |
| 7 | 55000.0 |
| 9 | 60000.0 |
| 8 | 65000.0 |
| 8 | 70000.0 |
Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range. You have to find the `min` and `max` separately. {{esql}} doesnt yet have an easy way to do that automatically.
The range can be omitted if the desired bucket size is known in advance. Simply provide it as the second argument:
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)
| SORT b
```
| c:long | b:double |
| --- | --- |
| 1 | 25000.0 |
| 1 | 30000.0 |
| 1 | 40000.0 |
| 2 | 45000.0 |
| 2 | 50000.0 |
| 1 | 55000.0 |
| 1 | 60000.0 |
| 1 | 65000.0 |
| 1 | 70000.0 |
Create hourly buckets for the last 24 hours, and calculate the number of events per hour:
```esql
FROM sample_data
| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()
| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())
```
| COUNT(*):long | bucket:date |
| --- | --- |
Create monthly buckets for the year 1985, and calculate the average salary by hiring month
```esql
FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT bucket
```
| AVG(salary):double | bucket:date |
| --- | --- |
| 46305.0 | 1985-02-01T00:00:00.000Z |
| 44817.0 | 1985-05-01T00:00:00.000Z |
| 62405.0 | 1985-07-01T00:00:00.000Z |
| 49095.0 | 1985-09-01T00:00:00.000Z |
| 51532.0 | 1985-10-01T00:00:00.000Z |
| 54539.75 | 1985-11-01T00:00:00.000Z |
`BUCKET` may be used in both the aggregating and grouping part of the [STATS …​ BY …​](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command provided that in the aggregating part the function is referenced by an alias defined in the grouping part, or that it is invoked with the exact same expression:
```esql
FROM employees
| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)
| SORT b1, b2
| KEEP s1, b1, s2, b2
```
| s1:double | b1:double | s2:double | b2:double |
| --- | --- | --- | --- |
| 351.0 | 350.0 | 1002.0 | 1000.0 |
| 401.0 | 400.0 | 1002.0 | 1000.0 |
| 451.0 | 450.0 | 1002.0 | 1000.0 |
| 501.0 | 500.0 | 1002.0 | 1000.0 |
| 551.0 | 550.0 | 1002.0 | 1000.0 |
| 601.0 | 600.0 | 1002.0 | 1000.0 |
| 601.0 | 600.0 | 1052.0 | 1050.0 |
| 651.0 | 650.0 | 1052.0 | 1050.0 |
| 701.0 | 700.0 | 1052.0 | 1050.0 |
| 751.0 | 750.0 | 1052.0 | 1050.0 |
| 801.0 | 800.0 | 1052.0 | 1050.0 |
Sometimes you need to change the start value of each bucket by a given duration (similar to date histogram aggregations [`offset`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md) parameter). To do so, you will need to take into account how the language handles expressions within the `STATS` command: if these contain functions or arithmetic operators, a virtual `EVAL` is inserted before and/or after the `STATS` command. Consequently, a double compensation is needed to adjust the bucketed date value before the aggregation and then again after. For instance, inserting a negative offset of `1 hour` to buckets of `1 year` looks like this:
```esql
FROM employees
| STATS dates = MV_SORT(VALUES(birth_date)) BY b = BUCKET(birth_date + 1 HOUR, 1 YEAR) - 1 HOUR
| EVAL d_count = MV_COUNT(dates)
| SORT d_count, b
| LIMIT 3
```
| dates:date | b:date | d_count:integer |
| --- | --- | --- |
| 1965-01-03T00:00:00.000Z | 1964-12-31T23:00:00.000Z | 1 |
| [1955-01-21T00:00:00.000Z, 1955-08-20T00:00:00.000Z, 1955-08-28T00:00:00.000Z, 1955-10-04T00:00:00.000Z] | 1954-12-31T23:00:00.000Z | 4 |
| [1957-04-04T00:00:00.000Z, 1957-05-23T00:00:00.000Z, 1957-05-25T00:00:00.000Z, 1957-12-03T00:00:00.000Z] | 1956-12-31T23:00:00.000Z | 4 |

View File

@ -1,46 +0,0 @@
## `BYTE_LENGTH` [esql-byte_length]
**Syntax**
:::{image} ../../../../../images/byte_length.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`string`
: String expression. If `null`, the function returns `null`.
**Description**
Returns the byte length of a string.
::::{note}
All strings are in UTF-8, so a single character can use multiple bytes.
::::
**Supported types**
| string | result |
| --- | --- |
| keyword | integer |
| text | integer |
**Example**
```esql
FROM airports
| WHERE country == "India"
| KEEP city
| EVAL fn_length = LENGTH(city), fn_byte_length = BYTE_LENGTH(city)
```
| city:keyword | fn_length:integer | fn_byte_length:integer |
| --- | --- | --- |
| Agwār | 5 | 6 |
| Ahmedabad | 9 | 9 |
| Bangalore | 9 | 9 |

View File

@ -1,113 +0,0 @@
## `CASE` [esql-case]
**Syntax**
:::{image} ../../../../../images/case.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`condition`
: A condition.
`trueValue`
: The value thats returned when the corresponding condition is the first to evaluate to `true`. The default value is returned when no condition matches.
`elseValue`
: The value thats returned when no condition evaluates to `true`.
**Description**
Accepts pairs of conditions and values. The function returns the value that belongs to the first condition that evaluates to `true`. If the number of arguments is odd, the last argument is the default value which is returned when no condition matches. If the number of arguments is even, and no condition matches, the function returns `null`.
**Supported types**
| condition | trueValue | elseValue | result |
| --- | --- | --- | --- |
| boolean | boolean | boolean | boolean |
| boolean | boolean | | boolean |
| boolean | cartesian_point | cartesian_point | cartesian_point |
| boolean | cartesian_point | | cartesian_point |
| boolean | cartesian_shape | cartesian_shape | cartesian_shape |
| boolean | cartesian_shape | | cartesian_shape |
| boolean | date | date | date |
| boolean | date | | date |
| boolean | date_nanos | date_nanos | date_nanos |
| boolean | date_nanos | | date_nanos |
| boolean | double | double | double |
| boolean | double | | double |
| boolean | geo_point | geo_point | geo_point |
| boolean | geo_point | | geo_point |
| boolean | geo_shape | geo_shape | geo_shape |
| boolean | geo_shape | | geo_shape |
| boolean | integer | integer | integer |
| boolean | integer | | integer |
| boolean | ip | ip | ip |
| boolean | ip | | ip |
| boolean | keyword | keyword | keyword |
| boolean | keyword | text | keyword |
| boolean | keyword | | keyword |
| boolean | long | long | long |
| boolean | long | | long |
| boolean | text | keyword | keyword |
| boolean | text | text | keyword |
| boolean | text | | keyword |
| boolean | unsigned_long | unsigned_long | unsigned_long |
| boolean | unsigned_long | | unsigned_long |
| boolean | version | version | version |
| boolean | version | | version |
**Examples**
Determine whether employees are monolingual, bilingual, or polyglot:
```esql
FROM employees
| EVAL type = CASE(
languages <= 1, "monolingual",
languages <= 2, "bilingual",
"polyglot")
| KEEP emp_no, languages, type
```
| emp_no:integer | languages:integer | type:keyword |
| --- | --- | --- |
| 10001 | 2 | bilingual |
| 10002 | 5 | polyglot |
| 10003 | 4 | polyglot |
| 10004 | 5 | polyglot |
| 10005 | 1 | monolingual |
Calculate the total connection success rate based on log messages:
```esql
FROM sample_data
| EVAL successful = CASE(
STARTS_WITH(message, "Connected to"), 1,
message == "Connection error", 0
)
| STATS success_rate = AVG(successful)
```
| success_rate:double |
| --- |
| 0.5 |
Calculate an hourly error rate as a percentage of the total number of log messages:
```esql
FROM sample_data
| EVAL error = CASE(message LIKE "*error*", 1, 0)
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
| STATS error_rate = AVG(error) by hour
| SORT hour
```
| error_rate:double | hour:date |
| --- | --- |
| 0.0 | 2023-10-23T12:00:00.000Z |
| 0.6 | 2023-10-23T13:00:00.000Z |

View File

@ -1,50 +0,0 @@
## `CATEGORIZE` [esql-categorize]
::::{warning}
Do not use on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
::::
**Syntax**
:::{image} ../../../../../images/categorize.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Expression to categorize
**Description**
Groups text messages into categories of similarly formatted text values.
`CATEGORIZE` has the following limitations:
* cant be used within other expressions
* cant be used with multiple groupings
* cant be used or referenced within aggregate functions
**Supported types**
| field | result |
| --- | --- |
| keyword | keyword |
| text | keyword |
**Example**
This example categorizes server logs messages into categories and aggregates their counts.
```esql
FROM sample_data
| STATS count=COUNT() BY category=CATEGORIZE(message)
```
| count:long | category:keyword |
| --- | --- |
| 3 | .**?Connected.+?to.**? |
| 3 | .**?Connection.+?error.**? |
| 1 | .**?Disconnected.**? |

View File

@ -1,39 +0,0 @@
## `CBRT` [esql-cbrt]
**Syntax**
:::{image} ../../../../../images/cbrt.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Numeric expression. If `null`, the function returns `null`.
**Description**
Returns the cube root of a number. The input can be any numeric value, the return value is always a double. Cube roots of infinities are null.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW d = 1000.0
| EVAL c = cbrt(d)
```
| d: double | c:double |
| --- | --- |
| 1000.0 | 10.0 |

View File

@ -1,44 +0,0 @@
## `CEIL` [esql-ceil]
**Syntax**
:::{image} ../../../../../images/ceil.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Numeric expression. If `null`, the function returns `null`.
**Description**
Round a number up to the nearest integer.
::::{note}
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#ceil(double)).
::::
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | integer |
| long | long |
| unsigned_long | unsigned_long |
**Example**
```esql
ROW a=1.8
| EVAL a=CEIL(a)
```
| a:double |
| --- |
| 2 |

View File

@ -1,42 +0,0 @@
## `CIDR_MATCH` [esql-cidr_match]
**Syntax**
:::{image} ../../../../../images/cidr_match.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`ip`
: IP address of type `ip` (both IPv4 and IPv6 are supported).
`blockX`
: CIDR block to test the IP against.
**Description**
Returns true if the provided IP is contained in one of the provided CIDR blocks.
**Supported types**
| ip | blockX | result |
| --- | --- | --- |
| ip | keyword | boolean |
| ip | text | boolean |
**Example**
```esql
FROM hosts
| WHERE CIDR_MATCH(ip1, "127.0.0.2/32", "127.0.0.3/32")
| KEEP card, host, ip0, ip1
```
| card:keyword | host:keyword | ip0:ip | ip1:ip |
| --- | --- | --- | --- |
| eth1 | beta | 127.0.0.1 | 127.0.0.2 |
| eth0 | gamma | fe80::cae2:65ff:fece:feb9 | 127.0.0.3 |

View File

@ -1,56 +0,0 @@
## `COALESCE` [esql-coalesce]
**Syntax**
:::{image} ../../../../../images/coalesce.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`first`
: Expression to evaluate.
`rest`
: Other expression to evaluate.
**Description**
Returns the first of its arguments that is not null. If all arguments are null, it returns `null`.
**Supported types**
| first | rest | result |
| --- | --- | --- |
| boolean | boolean | boolean |
| boolean | | boolean |
| cartesian_point | cartesian_point | cartesian_point |
| cartesian_shape | cartesian_shape | cartesian_shape |
| date | date | date |
| date_nanos | date_nanos | date_nanos |
| geo_point | geo_point | geo_point |
| geo_shape | geo_shape | geo_shape |
| integer | integer | integer |
| integer | | integer |
| ip | ip | ip |
| keyword | keyword | keyword |
| keyword | | keyword |
| long | long | long |
| long | | long |
| text | text | keyword |
| text | | keyword |
| version | version | version |
**Example**
```esql
ROW a=null, b="b"
| EVAL COALESCE(a, b)
```
| a:null | b:keyword | COALESCE(a, b):keyword |
| --- | --- | --- |
| null | b | b |

View File

@ -1,45 +0,0 @@
## `CONCAT` [esql-concat]
**Syntax**
:::{image} ../../../../../images/concat.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`string1`
: Strings to concatenate.
`string2`
: Strings to concatenate.
**Description**
Concatenates two or more strings.
**Supported types**
| string1 | string2 | result |
| --- | --- | --- |
| keyword | keyword | keyword |
| keyword | text | keyword |
| text | keyword | keyword |
| text | text | keyword |
**Example**
```esql
FROM employees
| KEEP first_name, last_name
| EVAL fullname = CONCAT(first_name, " ", last_name)
```
| first_name:keyword | last_name:keyword | fullname:keyword |
| --- | --- | --- |
| Alejandro | McAlpine | Alejandro McAlpine |
| Amabile | Gomatam | Amabile Gomatam |
| Anneke | Preusig | Anneke Preusig |

View File

@ -1,39 +0,0 @@
## `COS` [esql-cos]
**Syntax**
:::{image} ../../../../../images/cos.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`angle`
: An angle, in radians. If `null`, the function returns `null`.
**Description**
Returns the [cosine](https://en.wikipedia.org/wiki/Sine_and_cosine) of an angle.
**Supported types**
| angle | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW a=1.8
| EVAL cos=COS(a)
```
| a:double | cos:double |
| --- | --- |
| 1.8 | -0.2272020946930871 |

View File

@ -1,39 +0,0 @@
## `COSH` [esql-cosh]
**Syntax**
:::{image} ../../../../../images/cosh.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`number`
: Numeric expression. If `null`, the function returns `null`.
**Description**
Returns the [hyperbolic cosine](https://en.wikipedia.org/wiki/Hyperbolic_functions) of a number.
**Supported types**
| number | result |
| --- | --- |
| double | double |
| integer | double |
| long | double |
| unsigned_long | double |
**Example**
```esql
ROW a=1.8
| EVAL cosh=COSH(a)
```
| a:double | cosh:double |
| --- | --- |
| 1.8 | 3.1074731763172667 |

View File

@ -1,98 +0,0 @@
## `COUNT` [esql-count]
**Syntax**
:::{image} ../../../../../images/count.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Expression that outputs values to be counted. If omitted, equivalent to `COUNT(*)` (the number of rows).
**Description**
Returns the total number (count) of input values.
**Supported types**
| field | result |
| --- | --- |
| boolean | long |
| cartesian_point | long |
| date | long |
| double | long |
| geo_point | long |
| integer | long |
| ip | long |
| keyword | long |
| long | long |
| text | long |
| unsigned_long | long |
| version | long |
**Examples**
```esql
FROM employees
| STATS COUNT(height)
```
| COUNT(height):long |
| --- |
| 100 |
To count the number of rows, use `COUNT()` or `COUNT(*)`
```esql
FROM employees
| STATS count = COUNT(*) BY languages
| SORT languages DESC
```
| count:long | languages:integer |
| --- | --- |
| 10 | null |
| 21 | 5 |
| 18 | 4 |
| 17 | 3 |
| 19 | 2 |
| 15 | 1 |
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the values
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS word_count = COUNT(SPLIT(words, ";"))
```
| word_count:long |
| --- |
| 6 |
To count the number of times an expression returns `TRUE` use a [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) command to remove rows that shouldnt be included
```esql
ROW n=1
| WHERE n < 0
| STATS COUNT(n)
```
| COUNT(n):long |
| --- |
| 0 |
To count the same stream of data based on two different expressions use the pattern `COUNT(<expression> OR NULL)`. This builds on the three-valued logic ({{wikipedia}}/Three-valued_logic[3VL]) of the language: `TRUE OR NULL` is `TRUE`, but `FALSE OR NULL` is `NULL`, plus the way COUNT handles `NULL`s: `COUNT(TRUE)` and `COUNT(FALSE)` are both 1, but `COUNT(NULL)` is 0.
```esql
ROW n=1
| STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL)
```
| COUNT(n > 0 OR NULL):long | COUNT(n < 0 OR NULL):long |
| --- | --- |
| 1 | 0 |

View File

@ -1,123 +0,0 @@
## `COUNT_DISTINCT` [esql-count_distinct]
**Syntax**
:::{image} ../../../../../images/count_distinct.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`field`
: Column or literal for which to count the number of distinct values.
`precision`
: Precision threshold. Refer to [Counts are approximate](../../esql-functions-operators.md#esql-agg-count-distinct-approximate). The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
**Description**
Returns the approximate number of distinct values.
**Supported types**
| field | precision | result |
| --- | --- | --- |
| boolean | integer | long |
| boolean | long | long |
| boolean | unsigned_long | long |
| boolean | | long |
| date | integer | long |
| date | long | long |
| date | unsigned_long | long |
| date | | long |
| date_nanos | integer | long |
| date_nanos | long | long |
| date_nanos | unsigned_long | long |
| date_nanos | | long |
| double | integer | long |
| double | long | long |
| double | unsigned_long | long |
| double | | long |
| integer | integer | long |
| integer | long | long |
| integer | unsigned_long | long |
| integer | | long |
| ip | integer | long |
| ip | long | long |
| ip | unsigned_long | long |
| ip | | long |
| keyword | integer | long |
| keyword | long | long |
| keyword | unsigned_long | long |
| keyword | | long |
| long | integer | long |
| long | long | long |
| long | unsigned_long | long |
| long | | long |
| text | integer | long |
| text | long | long |
| text | unsigned_long | long |
| text | | long |
| version | integer | long |
| version | long | long |
| version | unsigned_long | long |
| version | | long |
**Examples**
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1)
```
| COUNT_DISTINCT(ip0):long | COUNT_DISTINCT(ip1):long |
| --- | --- |
| 7 | 8 |
With the optional second parameter to configure the precision threshold
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5)
```
| COUNT_DISTINCT(ip0, 80000):long | COUNT_DISTINCT(ip1, 5):long |
| --- | --- |
| 7 | 9 |
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";"))
```
| distinct_word_count:long |
| --- |
| 5 |
### Counts are approximate [esql-agg-count-distinct-approximate]
Computing exact counts requires loading values into a set and returning its size. This doesnt scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
This `COUNT_DISTINCT` function is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
* configurable precision, which decides on how to trade memory for accuracy,
* excellent accuracy on low-cardinality sets,
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
The following chart shows how the error varies before and after the threshold:
![cardinality error](/images/cardinality_error.png "")
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is `3000`.

View File

@ -1,83 +0,0 @@
## `DATE_DIFF` [esql-date_diff]
**Syntax**
:::{image} ../../../../../images/date_diff.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`unit`
: Time difference unit
`startTimestamp`
: A string representing a start timestamp
`endTimestamp`
: A string representing an end timestamp
**Description**
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
| Datetime difference units |
| --- |
| **unit** | **abbreviations** |
| year | years, yy, yyyy |
| quarter | quarters, qq, q |
| month | months, mm, m |
| dayofyear | dy, y |
| day | days, dd, d |
| week | weeks, wk, ww |
| weekday | weekdays, dw |
| hour | hours, hh |
| minute | minutes, mi, n |
| second | seconds, ss, s |
| millisecond | milliseconds, ms |
| microsecond | microseconds, mcs |
| nanosecond | nanoseconds, ns |
Note that while there is an overlap between the functions supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}.
**Supported types**
| unit | startTimestamp | endTimestamp | result |
| --- | --- | --- | --- |
| keyword | date | date | integer |
| keyword | date | date_nanos | integer |
| keyword | date_nanos | date | integer |
| keyword | date_nanos | date_nanos | integer |
| text | date | date | integer |
| text | date | date_nanos | integer |
| text | date_nanos | date | integer |
| text | date_nanos | date_nanos | integer |
**Examples**
```esql
ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z")
| EVAL dd_ms = DATE_DIFF("microseconds", date1, date2)
```
| date1:date | date2:date | dd_ms:integer |
| --- | --- | --- |
| 2023-12-02T11:00:00.000Z | 2023-12-02T11:00:00.001Z | 1000 |
When subtracting in calendar units - like year, month a.s.o. - only the fully elapsed units are counted. To avoid this and obtain also remainders, simply switch to the next smaller unit and do the date math accordingly.
```esql
ROW end_23=TO_DATETIME("2023-12-31T23:59:59.999Z"),
start_24=TO_DATETIME("2024-01-01T00:00:00.000Z"),
end_24=TO_DATETIME("2024-12-31T23:59:59.999")
| EVAL end23_to_start24=DATE_DIFF("year", end_23, start_24)
| EVAL end23_to_end24=DATE_DIFF("year", end_23, end_24)
| EVAL start_to_end_24=DATE_DIFF("year", start_24, end_24)
```
| end_23:date | start_24:date | end_24:date | end23_to_start24:integer | end23_to_end24:integer | start_to_end_24:integer |
| --- | --- | --- | --- | --- | --- |
| 2023-12-31T23:59:59.999Z | 2024-01-01T00:00:00.000Z | 2024-12-31T23:59:59.999Z | 0 | 1 | 0 |

View File

@ -1,52 +0,0 @@
## `DATE_EXTRACT` [esql-date_extract]
**Syntax**
:::{image} ../../../../../images/date_extract.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`datePart`
: Part of the date to extract. Can be: `aligned_day_of_week_in_month`, `aligned_day_of_week_in_year`, `aligned_week_of_month`, `aligned_week_of_year`, `ampm_of_day`, `clock_hour_of_ampm`, `clock_hour_of_day`, `day_of_month`, `day_of_week`, `day_of_year`, `epoch_day`, `era`, `hour_of_ampm`, `hour_of_day`, `instant_seconds`, `micro_of_day`, `micro_of_second`, `milli_of_day`, `milli_of_second`, `minute_of_day`, `minute_of_hour`, `month_of_year`, `nano_of_day`, `nano_of_second`, `offset_seconds`, `proleptic_month`, `second_of_day`, `second_of_minute`, `year`, or `year_of_era`. Refer to [java.time.temporal.ChronoField](https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.md) for a description of these values. If `null`, the function returns `null`.
`date`
: Date expression. If `null`, the function returns `null`.
**Description**
Extracts parts of a date, like year, month, day, hour.
**Supported types**
| datePart | date | result |
| --- | --- | --- |
| keyword | date | long |
| keyword | date_nanos | long |
| text | date | long |
| text | date_nanos | long |
**Examples**
```esql
ROW date = DATE_PARSE("yyyy-MM-dd", "2022-05-06")
| EVAL year = DATE_EXTRACT("year", date)
```
| date:date | year:long |
| --- | --- |
| 2022-05-06T00:00:00.000Z | 2022 |
Find all events that occurred outside of business hours (before 9 AM or after 5PM), on any given date:
```esql
FROM sample_data
| WHERE DATE_EXTRACT("hour_of_day", @timestamp) < 9 AND DATE_EXTRACT("hour_of_day", @timestamp) >= 17
```
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
| --- | --- | --- | --- |

View File

@ -1,47 +0,0 @@
## `DATE_FORMAT` [esql-date_format]
**Syntax**
:::{image} ../../../../../images/date_format.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`dateFormat`
: Date format (optional). If no format is specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used. If `null`, the function returns `null`.
`date`
: Date expression. If `null`, the function returns `null`.
**Description**
Returns a string representation of a date, in the provided format.
**Supported types**
| dateFormat | date | result |
| --- | --- | --- |
| date | | keyword |
| date_nanos | | keyword |
| keyword | date | keyword |
| keyword | date_nanos | keyword |
| text | date | keyword |
| text | date_nanos | keyword |
**Example**
```esql
FROM employees
| KEEP first_name, last_name, hire_date
| EVAL hired = DATE_FORMAT("yyyy-MM-dd", hire_date)
```
| first_name:keyword | last_name:keyword | hire_date:date | hired:keyword |
| --- | --- | --- | --- |
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-06-26 |
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-11-18 |
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-06-02 |

View File

@ -1,42 +0,0 @@
## `DATE_PARSE` [esql-date_parse]
**Syntax**
:::{image} ../../../../../images/date_parse.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`datePattern`
: The date format. Refer to the [`DateTimeFormatter` documentation](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/time/format/DateTimeFormatter.md) for the syntax. If `null`, the function returns `null`.
`dateString`
: Date expression as a string. If `null` or an empty string, the function returns `null`.
**Description**
Returns a date by parsing the second argument using the format specified in the first argument.
**Supported types**
| datePattern | dateString | result |
| --- | --- | --- |
| keyword | keyword | date |
| keyword | text | date |
| text | keyword | date |
| text | text | date |
**Example**
```esql
ROW date_string = "2022-05-06"
| EVAL date = DATE_PARSE("yyyy-MM-dd", date_string)
```
| date_string:keyword | date:date |
| --- | --- |
| 2022-05-06 | 2022-05-06T00:00:00.000Z |

View File

@ -1,86 +0,0 @@
## `DATE_TRUNC` [esql-date_trunc]
**Syntax**
:::{image} ../../../../../images/date_trunc.svg
:alt: Embedded
:class: text-center
:::
**Parameters**
`interval`
: Interval; expressed using the timespan literal syntax.
`date`
: Date expression
**Description**
Rounds down a date to the closest interval.
**Supported types**
| interval | date | result |
| --- | --- | --- |
| date_period | date | date |
| date_period | date_nanos | date_nanos |
| time_duration | date | date |
| time_duration | date_nanos | date_nanos |
**Examples**
```esql
FROM employees
| KEEP first_name, last_name, hire_date
| EVAL year_hired = DATE_TRUNC(1 year, hire_date)
```
| first_name:keyword | last_name:keyword | hire_date:date | year_hired:date |
| --- | --- | --- | --- |
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-01-01T00:00:00.000Z |
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-01-01T00:00:00.000Z |
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-01-01T00:00:00.000Z |
Combine `DATE_TRUNC` with [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) to create date histograms. For example, the number of hires per year:
```esql
FROM employees
| EVAL year = DATE_TRUNC(1 year, hire_date)
| STATS hires = COUNT(emp_no) BY year
| SORT year
```
| hires:long | year:date |
| --- | --- |
| 11 | 1985-01-01T00:00:00.000Z |
| 11 | 1986-01-01T00:00:00.000Z |
| 15 | 1987-01-01T00:00:00.000Z |
| 9 | 1988-01-01T00:00:00.000Z |
| 13 | 1989-01-01T00:00:00.000Z |
| 12 | 1990-01-01T00:00:00.000Z |
| 6 | 1991-01-01T00:00:00.000Z |
| 8 | 1992-01-01T00:00:00.000Z |
| 3 | 1993-01-01T00:00:00.000Z |
| 4 | 1994-01-01T00:00:00.000Z |
| 5 | 1995-01-01T00:00:00.000Z |
| 1 | 1996-01-01T00:00:00.000Z |
| 1 | 1997-01-01T00:00:00.000Z |
| 1 | 1999-01-01T00:00:00.000Z |
Or an hourly error rate:
```esql
FROM sample_data
| EVAL error = CASE(message LIKE "*error*", 1, 0)
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
| STATS error_rate = AVG(error) by hour
| SORT hour
```
| error_rate:double | hour:date |
| --- | --- |
| 0.0 | 2023-10-23T12:00:00.000Z |
| 0.6 | 2023-10-23T13:00:00.000Z |

View File

@ -5,7 +5,7 @@
Round a number up to the nearest integer. Round a number up to the nearest integer.
::::{note} ::::{note}
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#ceil(double)). This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.html#ceil(double)).
:::: ::::

View File

@ -4,9 +4,10 @@
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned. Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
| Datetime difference units | **Datetime difference units**
| --- |
| **unit** | **abbreviations** | | unit | abbreviations |
| --- | --- |
| year | years, yy, yyyy | | year | years, yy, yyyy |
| quarter | quarters, qq, q | | quarter | quarters, qq, q |
| month | months, mm, m | | month | months, mm, m |
@ -21,5 +22,9 @@ Subtracts the `startTimestamp` from the `endTimestamp` and returns the differenc
| microsecond | microseconds, mcs | | microsecond | microseconds, mcs |
| nanosecond | nanoseconds, ns | | nanosecond | nanoseconds, ns |
Note that while there is an overlap between the functions supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}. Note that while there is an overlap between the functions supported units and
{{esql}}s supported time span literals, these sets are distinct and not
interchangeable. Similarly, the supported abbreviations are conveniently shared
with implementations of this function in other established products and not
necessarily common with the date-time nomenclature used by {{es}}.

View File

@ -5,7 +5,9 @@
Round a number down to the nearest integer. Round a number down to the nearest integer.
::::{note} ::::{note}
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.floor](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#floor(double)). This is a noop for `long` (including unsigned) and `integer`.
For `double` this picks the closest `double` value to the integer
similar to [Math.floor](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.html#floor(double)).
:::: ::::

View File

@ -2,7 +2,7 @@
**Description** **Description**
Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](../../../esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once. Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once.
::::{note} ::::{note}
When run on `keyword` or `text` fields, this returns the last string in alphabetical order. When run on `boolean` columns this will return `true` if any values are `true`. When run on `keyword` or `text` fields, this returns the last string in alphabetical order. When run on `boolean` columns this will return `true` if any values are `true`.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](../../../esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once. Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Use `MATCH` to perform a [match query](/reference/query-languages/query-dsl-match-query.md) on the specified field. Using `MATCH` is equivalent to using the `match` query in the Elasticsearch Query DSL. Match can be used on fields from the text family like [text](/reference/elasticsearch/mapping-reference/text.md) and [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md), as well as other field types like keyword, boolean, dates, and numeric types. Match can use [function named parameters](/reference/query-languages/esql/esql-syntax.md#esql-function-named-params) to specify additional options for the match query. All [match query parameters](/reference/query-languages/query-dsl-match-query.md#match-field-params) are supported. For a simplified syntax, you can use the [match operator](../../../esql-functions-operators.md#esql-search-operators) `:` operator instead of `MATCH`. `MATCH` returns true if the provided query matches the row. Use `MATCH` to perform a [match query](/reference/query-languages/query-dsl-match-query.md) on the specified field. Using `MATCH` is equivalent to using the `match` query in the Elasticsearch Query DSL. Match can be used on fields from the text family like [text](/reference/elasticsearch/mapping-reference/text.md) and [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md), as well as other field types like keyword, boolean, dates, and numeric types. Match can use [function named parameters](/reference/query-languages/esql/esql-syntax.md#esql-function-named-params) to specify additional options for the match query. All [match query parameters](/reference/query-languages/query-dsl-match-query.md#match-field-params) are supported. For a simplified syntax, you can use the [match operator](/reference/query-languages/esql/esql-functions-operators.md#esql-search-operators) `:` operator instead of `MATCH`. `MATCH` returns true if the provided query matches the row.

View File

@ -2,10 +2,10 @@
**Description** **Description**
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile). The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile).
::::{note} ::::{note}
Like [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](../../../esql-functions-operators.md#esql-percentile-approximate). Like [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile-approximate).
:::: ::::

View File

@ -5,7 +5,7 @@
Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation. It is calculated as the median of each data points deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`. Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation. It is calculated as the median of each data points deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`.
::::{note} ::::{note}
Like [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](../../../esql-functions-operators.md#esql-percentile-approximate). Like [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile-approximate).
:::: ::::

View File

@ -2,7 +2,11 @@
**Description** **Description**
Converts a multivalued expression into a single valued column containing the first value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split). Converts a multivalued expression into a single valued column containing the first value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split).
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but dont rely on that. If you need the minimum value use [`MV_MIN`](../../../esql-functions-operators.md#esql-mv_min) instead of `MV_FIRST`. `MV_MIN` has optimizations for sorted values so there isnt a performance benefit to `MV_FIRST`. The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
underlying storage is not guaranteed. It is **frequently** ascending, but dont
rely on that. If you need the minimum value use [`MV_MIN`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_min) instead of
`MV_FIRST`. `MV_MIN` has optimizations for sorted values so there isnt a
performance benefit to `MV_FIRST`.

View File

@ -2,7 +2,11 @@
**Description** **Description**
Converts a multivalue expression into a single valued column containing the last value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split). Converts a multivalue expression into a single valued column containing the last value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split).
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but dont rely on that. If you need the maximum value use [`MV_MAX`](../../../esql-functions-operators.md#esql-mv_max) instead of `MV_LAST`. `MV_MAX` has optimizations for sorted values so there isnt a performance benefit to `MV_LAST`. The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
underlying storage is not guaranteed. It is **frequently** ascending, but dont
rely on that. If you need the maximum value use [`MV_MAX`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_max) instead of
`MV_LAST`. `MV_MAX` has optimizations for sorted values so there isnt a
performance benefit to `MV_LAST`.

View File

@ -2,7 +2,9 @@
**Description** **Description**
Returns a subset of the multivalued field using the start and end index values. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split) or [`MV_SORT`](../../../esql-functions-operators.md#esql-mv_sort). Returns a subset of the multivalued field using the start and end index values. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split) or [`MV_SORT`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_sort).
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but dont rely on that. The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
underlying storage is not guaranteed. It is **frequently** ascending, but dont
rely on that.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Returns whether the first geometry contains the second geometry. This is the inverse of the [ST_WITHIN](../../../esql-functions-operators.md#esql-st_within) function. Returns whether the first geometry contains the second geometry. This is the inverse of the [ST_WITHIN](/reference/query-languages/esql/esql-functions-operators.md#esql-st_within) function.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Returns whether the two geometries or geometry columns are disjoint. This is the inverse of the [ST_INTERSECTS](../../../esql-functions-operators.md#esql-st_intersects) function. In mathematical terms: ST_Disjoint(A, B) ⇔ A ⋂ B = ∅ Returns whether the two geometries or geometry columns are disjoint. This is the inverse of the [ST_INTERSECTS](/reference/query-languages/esql/esql-functions-operators.md#esql-st_intersects) function. In mathematical terms: ST_Disjoint(A, B) ⇔ A ⋂ B = ∅

View File

@ -2,5 +2,5 @@
**Description** **Description**
Returns true if two geometries intersect. They intersect if they have any point in common, including their interior points (points along lines or within polygons). This is the inverse of the [ST_DISJOINT](../../../esql-functions-operators.md#esql-st_disjoint) function. In mathematical terms: ST_Intersects(A, B) ⇔ A ⋂ B ≠ ∅ Returns true if two geometries intersect. They intersect if they have any point in common, including their interior points (points along lines or within polygons). This is the inverse of the [ST_DISJOINT](/reference/query-languages/esql/esql-functions-operators.md#esql-st_disjoint) function. In mathematical terms: ST_Intersects(A, B) ⇔ A ⋂ B ≠ ∅

View File

@ -2,5 +2,5 @@
**Description** **Description**
Returns whether the first geometry is within the second geometry. This is the inverse of the [ST_CONTAINS](../../../esql-functions-operators.md#esql-st_contains) function. Returns whether the first geometry is within the second geometry. This is the inverse of the [ST_CONTAINS](/reference/query-languages/esql/esql-functions-operators.md#esql-st_contains) function.

View File

@ -0,0 +1,6 @@
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
**Description**
Performs a Term query on the specified field. Returns true if the provided term matches the row.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Converts an input value to a boolean value. A string value of **true** will be case-insensitive converted to the Boolean **true**. For anything else, including the empty string, the function will return **false**. The numerical value of **0** will be converted to **false**, anything else will be converted to **true**. Converts an input value to a boolean value. A string value of `true` will be case-insensitive converted to the Boolean `true`. For anything else, including the empty string, the function will return `false`. The numerical value of `0` will be converted to `false`, anything else will be converted to `true`.

View File

@ -2,7 +2,7 @@
**Description** **Description**
Converts an input value to a date value. A string will only be successfully converted if its respecting the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use [`DATE_PARSE`](../../../esql-functions-operators.md#esql-date_parse). Converts an input value to a date value. A string will only be successfully converted if its respecting the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use [`DATE_PARSE`](/reference/query-languages/esql/esql-functions-operators.md#esql-date_parse).
::::{note} ::::{note}
Note that when converting from nanosecond resolution to millisecond resolution with this function, the nanosecond date is truncated, not rounded. Note that when converting from nanosecond resolution to millisecond resolution with this function, the nanosecond date is truncated, not rounded.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Converts an input value to a double value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to double. Boolean **true** will be converted to double **1.0**, **false** to **0.0**. Converts an input value to a double value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to double. Boolean `true` will be converted to double `1.0`, `false` to `0.0`.

View File

@ -2,5 +2,5 @@
**Description** **Description**
Converts an input value to an integer value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to integer. Boolean **true** will be converted to integer **1**, **false** to **0**. Converts an input value to an integer value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to integer. Boolean `true` will be converted to integer `1`, `false` to `0`.

Some files were not shown because too many files have changed in this diff Show More