Manual backport of https://github.com/elastic/elasticsearch/pull/124312 and https://github.com/elastic/elasticsearch/pull/124742
This commit is contained in:
parent
ed93e24195
commit
d3d9a00fb1
|
@ -13,6 +13,9 @@ x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/EsqlBasePar
|
|||
x-pack/plugin/esql/src/main/generated/** linguist-generated=true
|
||||
x-pack/plugin/esql/src/main/generated-src/** linguist-generated=true
|
||||
|
||||
# ESQL functions docs are autogenerated. More information at `docs/reference/esql/functions/README.md`
|
||||
docs/reference/esql/functions/*/** linguist-generated=true
|
||||
# ESQL functions docs are autogenerated. More information at `docs/reference/query-languages/esql/README.md`
|
||||
docs/reference/query-languages/esql/_snippets/functions/*/** linguist-generated=true
|
||||
#docs/reference/query-languages/esql/_snippets/operators/*/** linguist-generated=true
|
||||
docs/reference/query-languages/esql/images/** linguist-generated=true
|
||||
docs/reference/query-languages/esql/kibana/** linguist-generated=true
|
||||
|
||||
|
|
|
@ -2,8 +2,8 @@ project: 'Elasticsearch'
|
|||
exclude:
|
||||
- README.md
|
||||
- internal/*
|
||||
- reference/esql/functions/kibana/docs/*
|
||||
- reference/esql/functions/README.md
|
||||
- reference/query-languages/esql/kibana/docs/**
|
||||
- reference/query-languages/esql/README.md
|
||||
cross_links:
|
||||
- beats
|
||||
- cloud
|
||||
|
|
|
@ -0,0 +1,17 @@
|
|||
* configurable precision, which decides on how to trade memory for accuracy,
|
||||
* excellent accuracy on low-cardinality sets,
|
||||
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
|
||||
|
||||
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
|
||||
|
||||
The following chart shows how the error varies before and after the threshold:
|
||||
|
||||

|
||||
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
|
||||
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
|
||||
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
|
||||
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
|
||||
|
||||
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
|
||||
hashes in a dataset can affect the accuracy of the cardinality.
|
|
@ -1,60 +1,3 @@
|
|||
## `PERCENTILE` [esql-percentile]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/percentile.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the `MEDIAN`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | percentile | result |
|
||||
| --- | --- | --- |
|
||||
| double | double | double |
|
||||
| double | integer | double |
|
||||
| double | long | double |
|
||||
| integer | double | double |
|
||||
| integer | integer | double |
|
||||
| integer | long | double |
|
||||
| long | double | double |
|
||||
| long | integer | double |
|
||||
| long | long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS p0 = PERCENTILE(salary, 0)
|
||||
, p50 = PERCENTILE(salary, 50)
|
||||
, p99 = PERCENTILE(salary, 99)
|
||||
```
|
||||
|
||||
| p0:double | p50:double | p99:double |
|
||||
| --- | --- | --- |
|
||||
| 25324 | 47003 | 74970.29 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate a percentile of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `PERCENTILE` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80)
|
||||
```
|
||||
|
||||
| p80_max_salary_change:double |
|
||||
| --- |
|
||||
| 12.132 |
|
||||
|
||||
|
||||
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
|
||||
|
||||
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
|
||||
|
||||
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
|
||||
|
@ -72,11 +15,3 @@ The following chart shows the relative error on a uniform distribution depending
|
|||

|
||||
|
||||
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
|
||||
|
||||
::::{warning}
|
||||
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
|
@ -65,19 +65,8 @@ Computing exact counts requires loading values into a hash set and returning its
|
|||
|
||||
This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
|
||||
|
||||
* configurable precision, which decides on how to trade memory for accuracy,
|
||||
* excellent accuracy on low-cardinality sets,
|
||||
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
|
||||
|
||||
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
|
||||
|
||||
The following chart shows how the error varies before and after the threshold:
|
||||
|
||||

|
||||
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
|
||||
|
||||
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
|
||||
:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
|
||||
:::
|
||||
|
||||
|
||||
## Pre-computed hashes [_pre_computed_hashes]
|
||||
|
|
|
@ -175,31 +175,14 @@ GET latency/_search
|
|||
|
||||
## Percentiles are (usually) approximate [search-aggregations-metrics-percentile-aggregation-approximation]
|
||||
|
||||
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
|
||||
|
||||
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
|
||||
|
||||
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
|
||||
|
||||
When using this metric, there are a few guidelines to keep in mind:
|
||||
|
||||
* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
|
||||
* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
|
||||
* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated
|
||||
|
||||
The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
|
||||
|
||||

|
||||
|
||||
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
|
||||
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
|
||||
:::
|
||||
|
||||
::::{warning}
|
||||
Percentile aggregations are also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## Compression [search-aggregations-metrics-percentile-aggregation-compression]
|
||||
|
||||
Approximate algorithms must balance memory utilization with estimation accuracy. This balance can be controlled using a `compression` parameter:
|
||||
|
|
|
@ -1,23 +0,0 @@
|
|||
The files in these subdirectories are generated by ESQL's test suite:
|
||||
* `description` - description of each function scraped from `@FunctionInfo#description`
|
||||
* `examples` - examples of each function scraped from `@FunctionInfo#examples`
|
||||
* `parameters` - description of each function's parameters scraped from `@Param`
|
||||
* `signature` - railroad diagram of the syntax to invoke each function
|
||||
* `types` - a table of each combination of support type for each parameter. These are generated from tests.
|
||||
* `layout` - a fully generated description for each function
|
||||
* `kibana/definition` - function definitions for kibana's ESQL editor
|
||||
* `kibana/docs` - the inline docs for kibana
|
||||
|
||||
Most functions can use the generated docs generated in the `layout` directory.
|
||||
If we need something more custom for the function we can make a file in this
|
||||
directory that can `include::` any parts of the files above.
|
||||
|
||||
To regenerate the files for a function run its tests using gradle:
|
||||
```
|
||||
./gradlew :x-pack:plugin:esql:test -Dtests.class='*SinTests'
|
||||
```
|
||||
|
||||
To regenerate the files for all functions run all of ESQL's tests using gradle:
|
||||
```
|
||||
./gradlew :x-pack:plugin:esql:test
|
||||
```
|
|
@ -1,17 +0,0 @@
|
|||
{
|
||||
"comment" : "This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.",
|
||||
"type" : "scalar",
|
||||
"name" : "pi",
|
||||
"description" : "Returns Pi, the ratio of a circle's circumference to its diameter.",
|
||||
"signatures" : [
|
||||
{
|
||||
"params" : [ ],
|
||||
"returnType" : "double"
|
||||
}
|
||||
],
|
||||
"examples" : [
|
||||
"ROW PI()"
|
||||
],
|
||||
"preview" : false,
|
||||
"snapshot_only" : false
|
||||
}
|
|
@ -1,17 +0,0 @@
|
|||
{
|
||||
"comment" : "This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.",
|
||||
"type" : "scalar",
|
||||
"name" : "tau",
|
||||
"description" : "Returns the ratio of a circle's circumference to its radius.",
|
||||
"signatures" : [
|
||||
{
|
||||
"params" : [ ],
|
||||
"returnType" : "double"
|
||||
}
|
||||
],
|
||||
"examples" : [
|
||||
"ROW TAU()"
|
||||
],
|
||||
"preview" : false,
|
||||
"snapshot_only" : false
|
||||
}
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### ABS
|
||||
Returns the absolute value.
|
||||
|
||||
```
|
||||
ROW number = -1.0
|
||||
| EVAL abs_number = ABS(number)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### AVG
|
||||
The average of a numeric field.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| STATS AVG(height)
|
||||
```
|
|
@ -1,12 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### CIDR_MATCH
|
||||
Returns true if the provided IP is contained in one of the provided CIDR blocks.
|
||||
|
||||
```
|
||||
FROM hosts
|
||||
| WHERE CIDR_MATCH(ip1, "127.0.0.2/32", "127.0.0.3/32")
|
||||
| KEEP card, host, ip0, ip1
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### COS
|
||||
Returns the cosine of an angle.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL cos=COS(a)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### COSH
|
||||
Returns the hyperbolic cosine of a number.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL cosh=COSH(a)
|
||||
```
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### E
|
||||
Returns Euler's number.
|
||||
|
||||
```
|
||||
ROW E()
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### FROM_BASE64
|
||||
Decode a base64 string.
|
||||
|
||||
```
|
||||
row a = "ZWxhc3RpYw=="
|
||||
| eval d = from_base64(a)
|
||||
```
|
|
@ -1,14 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### KQL
|
||||
Performs a KQL query. Returns true if the provided KQL query string matches the row.
|
||||
|
||||
```
|
||||
FROM books
|
||||
| WHERE KQL("author: Faulkner")
|
||||
| KEEP book_no, author
|
||||
| SORT book_no
|
||||
| LIMIT 5
|
||||
```
|
|
@ -1,14 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### LEFT
|
||||
Returns the substring that extracts 'length' chars from 'string' starting from the left.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| KEEP last_name
|
||||
| EVAL left = LEFT(last_name, 3)
|
||||
| SORT last_name ASC
|
||||
| LIMIT 5
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### MAX
|
||||
The maximum value of a field.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| STATS MAX(languages)
|
||||
```
|
|
@ -1,13 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### MD5
|
||||
Computes the MD5 hash of the input.
|
||||
|
||||
```
|
||||
FROM sample_data
|
||||
| WHERE message != "Connection error"
|
||||
| EVAL md5 = md5(message)
|
||||
| KEEP message, md5;
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### MIN
|
||||
The minimum value of a field.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| STATS MIN(languages)
|
||||
```
|
|
@ -1,7 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### MV_APPEND
|
||||
Concatenates values of two multi-value fields.
|
||||
|
|
@ -1,12 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### MV_DEDUPE
|
||||
Remove duplicate values from a multivalued field.
|
||||
|
||||
```
|
||||
ROW a=["foo", "foo", "bar", "foo"]
|
||||
| EVAL dedupe_a = MV_DEDUPE(a)
|
||||
```
|
||||
Note: `MV_DEDUPE` may, but won't always, sort the values in the column.
|
|
@ -1,7 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### NEG
|
||||
Returns the negation of the argument.
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### NOT_RLIKE
|
||||
Use `NOT RLIKE` to filter data based on string patterns using using
|
||||
<<regexp-syntax,regular expressions>>. `NOT RLIKE` usually acts on a field placed on
|
||||
the left-hand side of the operator, but it can also act on a constant (literal)
|
||||
expression. The right-hand side of the operator represents the pattern.
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### NOW
|
||||
Returns current date and time.
|
||||
|
||||
```
|
||||
ROW current_date = NOW()
|
||||
```
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### PI
|
||||
Returns Pi, the ratio of a circle's circumference to its diameter.
|
||||
|
||||
```
|
||||
ROW PI()
|
||||
```
|
|
@ -1,14 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### RIGHT
|
||||
Return the substring that extracts 'length' chars from 'str' starting from the right.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| KEEP last_name
|
||||
| EVAL right = RIGHT(last_name, 3)
|
||||
| SORT last_name ASC
|
||||
| LIMIT 5
|
||||
```
|
|
@ -1,13 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### SHA1
|
||||
Computes the SHA1 hash of the input.
|
||||
|
||||
```
|
||||
FROM sample_data
|
||||
| WHERE message != "Connection error"
|
||||
| EVAL sha1 = sha1(message)
|
||||
| KEEP message, sha1;
|
||||
```
|
|
@ -1,13 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### SHA256
|
||||
Computes the SHA256 hash of the input.
|
||||
|
||||
```
|
||||
FROM sample_data
|
||||
| WHERE message != "Connection error"
|
||||
| EVAL sha256 = sha256(message)
|
||||
| KEEP message, sha256;
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### SIN
|
||||
Returns the sine of an angle.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL sin=SIN(a)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### SINH
|
||||
Returns the hyperbolic sine of a number.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL sinh=SINH(a)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### STD_DEV
|
||||
The standard deviation of a numeric field.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| STATS STD_DEV(height)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### SUM
|
||||
The sum of a numeric expression.
|
||||
|
||||
```
|
||||
FROM employees
|
||||
| STATS SUM(languages)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TAN
|
||||
Returns the tangent of an angle.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL tan=TAN(a)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TANH
|
||||
Returns the hyperbolic tangent of a number.
|
||||
|
||||
```
|
||||
ROW a=1.8
|
||||
| EVAL tanh=TANH(a)
|
||||
```
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TAU
|
||||
Returns the ratio of a circle's circumference to its radius.
|
||||
|
||||
```
|
||||
ROW TAU()
|
||||
```
|
|
@ -1,13 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TERM
|
||||
Performs a Term query on the specified field. Returns true if the provided term matches the row.
|
||||
|
||||
```
|
||||
FROM books
|
||||
| WHERE TERM(author, "gabriel")
|
||||
| KEEP book_no, title
|
||||
| LIMIT 3;
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TO_BASE64
|
||||
Encode a string to a base64 string.
|
||||
|
||||
```
|
||||
row a = "elastic"
|
||||
| eval e = to_base64(a)
|
||||
```
|
|
@ -1,14 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TO_BOOLEAN
|
||||
Converts an input value to a boolean value.
|
||||
A string value of *true* will be case-insensitive converted to the Boolean *true*.
|
||||
For anything else, including the empty string, the function will return *false*.
|
||||
The numerical value of *0* will be converted to *false*, anything else will be converted to *true*.
|
||||
|
||||
```
|
||||
ROW str = ["true", "TRuE", "false", "", "yes", "1"]
|
||||
| EVAL bool = TO_BOOLEAN(str)
|
||||
```
|
|
@ -1,11 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TO_STRING
|
||||
Converts an input value into a string.
|
||||
|
||||
```
|
||||
ROW a=10
|
||||
| EVAL j = TO_STRING(a)
|
||||
```
|
|
@ -1,10 +0,0 @@
|
|||
<!--
|
||||
This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
-->
|
||||
|
||||
### TO_VERSION
|
||||
Converts an input string to a version value.
|
||||
|
||||
```
|
||||
ROW v = TO_VERSION("1.2.3")
|
||||
```
|
|
@ -1,21 +0,0 @@
|
|||
{
|
||||
"bool" : "to_boolean",
|
||||
"boolean" : "to_boolean",
|
||||
"cartesian_point" : "to_cartesianpoint",
|
||||
"cartesian_shape" : "to_cartesianshape",
|
||||
"date_nanos" : "to_date_nanos",
|
||||
"date_period" : "to_dateperiod",
|
||||
"datetime" : "to_datetime",
|
||||
"double" : "to_double",
|
||||
"geo_point" : "to_geopoint",
|
||||
"geo_shape" : "to_geoshape",
|
||||
"int" : "to_integer",
|
||||
"integer" : "to_integer",
|
||||
"ip" : "to_ip",
|
||||
"keyword" : "to_string",
|
||||
"long" : "to_long",
|
||||
"string" : "to_string",
|
||||
"time_duration" : "to_timeduration",
|
||||
"unsigned_long" : "to_unsigned_long",
|
||||
"version" : "to_version"
|
||||
}
|
|
@ -0,0 +1,50 @@
|
|||
The ES|QL documentation is composed of static content and generated content.
|
||||
The static content exists in this directory and can be edited by hand.
|
||||
However, the sub-directories `_snippets`, `images` and `kibana` contain mostly
|
||||
generated content.
|
||||
|
||||
### _snippets
|
||||
|
||||
In `_snippets` there are files that can be included within other files
|
||||
using the [File Inclusion](https://elastic.github.io/docs-builder/syntax/file_inclusion/)
|
||||
feature of the Elastic Docs V3 system.
|
||||
Most, but not all, files in this directory are generated.
|
||||
In particular the directories `_snippets/functions/*` and `_snippets/operators/*`
|
||||
contain subdirectories that are mostly generated:
|
||||
|
||||
* `description` - description of each function scraped from `@FunctionInfo#description`
|
||||
* `examples` - examples of each function scraped from `@FunctionInfo#examples`
|
||||
* `parameters` - description of each function's parameters scraped from `@Param`
|
||||
* `signature` - railroad diagram of the syntax to invoke each function
|
||||
* `types` - a table of each combination of support type for each parameter. These are generated from tests.
|
||||
* `layout` - a fully generated description for each function
|
||||
|
||||
Most functions can use the generated docs generated in the `layout` directory.
|
||||
If we need something more custom for the function we can make a file in this
|
||||
directory that can `include::` any parts of the files above.
|
||||
|
||||
To regenerate the files for a function run its tests using gradle.
|
||||
For example to generate docs for the `CASE` function:
|
||||
```
|
||||
./gradlew :x-pack:plugin:esql:test -Dtests.class='CaseTests'
|
||||
```
|
||||
|
||||
To regenerate the files for all functions run all of ESQL's tests using gradle:
|
||||
```
|
||||
./gradlew :x-pack:plugin:esql:test
|
||||
```
|
||||
|
||||
### images
|
||||
|
||||
The `images` directory contains `functions` and `operators` sub-directories with
|
||||
the `*.svg` files used to describe the syntax of each function or operator.
|
||||
These are all generated by the same tests that generate the functions and operators docs above.
|
||||
|
||||
### kibana
|
||||
|
||||
The `kibana` directory contains `definition` and `docs` sub-directories that are generated:
|
||||
|
||||
* `kibana/definition` - function definitions for kibana's ESQL editor
|
||||
* `kibana/docs` - the inline docs for kibana
|
||||
|
||||
These are also generated as part of the unit tests described above.
|
|
@ -1,53 +0,0 @@
|
|||
## {{esql}} aggregate functions [esql-agg-functions]
|
||||
|
||||
|
||||
The [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command supports these aggregate functions:
|
||||
|
||||
:::{include} lists/aggregation-functions.md
|
||||
:::
|
||||
|
||||
:::{include} functions/avg.md
|
||||
:::
|
||||
|
||||
:::{include} functions/count.md
|
||||
:::
|
||||
|
||||
:::{include} functions/count_distinct.md
|
||||
:::
|
||||
|
||||
:::{include} functions/max.md
|
||||
:::
|
||||
|
||||
:::{include} functions/median.md
|
||||
:::
|
||||
|
||||
:::{include} functions/median_absolute_deviation.md
|
||||
:::
|
||||
|
||||
:::{include} functions/min.md
|
||||
:::
|
||||
|
||||
:::{include} functions/percentile.md
|
||||
:::
|
||||
|
||||
:::{include} functions/st_centroid_agg.md
|
||||
:::
|
||||
|
||||
:::{include} functions/st_extent_agg.md
|
||||
:::
|
||||
|
||||
:::{include} functions/std_dev.md
|
||||
:::
|
||||
|
||||
:::{include} functions/sum.md
|
||||
:::
|
||||
|
||||
:::{include} functions/top.md
|
||||
:::
|
||||
|
||||
:::{include} functions/values.md
|
||||
:::
|
||||
|
||||
:::{include} functions/weighted_avg.md
|
||||
:::
|
||||
|
|
@ -1,930 +0,0 @@
|
|||
## {{esql}} aggregate functions [esql-agg-functions]
|
||||
|
||||
|
||||
The [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command supports these aggregate functions:
|
||||
|
||||
:::{include} lists/aggregation-functions.md
|
||||
:::
|
||||
|
||||
## `AVG` [esql-avg]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/avg.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The average of a numeric field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS AVG(height)
|
||||
```
|
||||
|
||||
| AVG(height):double |
|
||||
| --- |
|
||||
| 1.7682 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the average over a multivalued column, first use `MV_AVG` to average the multiple values per row, and use the result with the `AVG` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
|
||||
```
|
||||
|
||||
| avg_salary_change:double |
|
||||
| --- |
|
||||
| 1.3904535865 |
|
||||
|
||||
|
||||
## `COUNT` [esql-count]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/count.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Expression that outputs values to be counted. If omitted, equivalent to `COUNT(*)` (the number of rows).
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the total number (count) of input values.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| boolean | long |
|
||||
| cartesian_point | long |
|
||||
| date | long |
|
||||
| double | long |
|
||||
| geo_point | long |
|
||||
| integer | long |
|
||||
| ip | long |
|
||||
| keyword | long |
|
||||
| long | long |
|
||||
| text | long |
|
||||
| unsigned_long | long |
|
||||
| version | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS COUNT(height)
|
||||
```
|
||||
|
||||
| COUNT(height):long |
|
||||
| --- |
|
||||
| 100 |
|
||||
|
||||
To count the number of rows, use `COUNT()` or `COUNT(*)`
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS count = COUNT(*) BY languages
|
||||
| SORT languages DESC
|
||||
```
|
||||
|
||||
| count:long | languages:integer |
|
||||
| --- | --- |
|
||||
| 10 | null |
|
||||
| 21 | 5 |
|
||||
| 18 | 4 |
|
||||
| 17 | 3 |
|
||||
| 19 | 2 |
|
||||
| 15 | 1 |
|
||||
|
||||
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the values
|
||||
|
||||
```esql
|
||||
ROW words="foo;bar;baz;qux;quux;foo"
|
||||
| STATS word_count = COUNT(SPLIT(words, ";"))
|
||||
```
|
||||
|
||||
| word_count:long |
|
||||
| --- |
|
||||
| 6 |
|
||||
|
||||
To count the number of times an expression returns `TRUE` use a [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) command to remove rows that shouldn’t be included
|
||||
|
||||
```esql
|
||||
ROW n=1
|
||||
| WHERE n < 0
|
||||
| STATS COUNT(n)
|
||||
```
|
||||
|
||||
| COUNT(n):long |
|
||||
| --- |
|
||||
| 0 |
|
||||
|
||||
To count the same stream of data based on two different expressions use the pattern `COUNT(<expression> OR NULL)`. This builds on the three-valued logic ({{wikipedia}}/Three-valued_logic[3VL]) of the language: `TRUE OR NULL` is `TRUE`, but `FALSE OR NULL` is `NULL`, plus the way COUNT handles `NULL`s: `COUNT(TRUE)` and `COUNT(FALSE)` are both 1, but `COUNT(NULL)` is 0.
|
||||
|
||||
```esql
|
||||
ROW n=1
|
||||
| STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL)
|
||||
```
|
||||
|
||||
| COUNT(n > 0 OR NULL):long | COUNT(n < 0 OR NULL):long |
|
||||
| --- | --- |
|
||||
| 1 | 0 |
|
||||
|
||||
|
||||
## `COUNT_DISTINCT` [esql-count_distinct]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/count_distinct.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Column or literal for which to count the number of distinct values.
|
||||
|
||||
`precision`
|
||||
: Precision threshold. Refer to [Counts are approximate](../esql-functions-operators.md#esql-agg-count-distinct-approximate). The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the approximate number of distinct values.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | precision | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | integer | long |
|
||||
| boolean | long | long |
|
||||
| boolean | unsigned_long | long |
|
||||
| boolean | | long |
|
||||
| date | integer | long |
|
||||
| date | long | long |
|
||||
| date | unsigned_long | long |
|
||||
| date | | long |
|
||||
| date_nanos | integer | long |
|
||||
| date_nanos | long | long |
|
||||
| date_nanos | unsigned_long | long |
|
||||
| date_nanos | | long |
|
||||
| double | integer | long |
|
||||
| double | long | long |
|
||||
| double | unsigned_long | long |
|
||||
| double | | long |
|
||||
| integer | integer | long |
|
||||
| integer | long | long |
|
||||
| integer | unsigned_long | long |
|
||||
| integer | | long |
|
||||
| ip | integer | long |
|
||||
| ip | long | long |
|
||||
| ip | unsigned_long | long |
|
||||
| ip | | long |
|
||||
| keyword | integer | long |
|
||||
| keyword | long | long |
|
||||
| keyword | unsigned_long | long |
|
||||
| keyword | | long |
|
||||
| long | integer | long |
|
||||
| long | long | long |
|
||||
| long | unsigned_long | long |
|
||||
| long | | long |
|
||||
| text | integer | long |
|
||||
| text | long | long |
|
||||
| text | unsigned_long | long |
|
||||
| text | | long |
|
||||
| version | integer | long |
|
||||
| version | long | long |
|
||||
| version | unsigned_long | long |
|
||||
| version | | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM hosts
|
||||
| STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1)
|
||||
```
|
||||
|
||||
| COUNT_DISTINCT(ip0):long | COUNT_DISTINCT(ip1):long |
|
||||
| --- | --- |
|
||||
| 7 | 8 |
|
||||
|
||||
With the optional second parameter to configure the precision threshold
|
||||
|
||||
```esql
|
||||
FROM hosts
|
||||
| STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5)
|
||||
```
|
||||
|
||||
| COUNT_DISTINCT(ip0, 80000):long | COUNT_DISTINCT(ip1, 5):long |
|
||||
| --- | --- |
|
||||
| 7 | 9 |
|
||||
|
||||
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values
|
||||
|
||||
```esql
|
||||
ROW words="foo;bar;baz;qux;quux;foo"
|
||||
| STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";"))
|
||||
```
|
||||
|
||||
| distinct_word_count:long |
|
||||
| --- |
|
||||
| 5 |
|
||||
|
||||
|
||||
### Counts are approximate [esql-agg-count-distinct-approximate]
|
||||
|
||||
Computing exact counts requires loading values into a set and returning its size. This doesn’t scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
|
||||
|
||||
This `COUNT_DISTINCT` function is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
|
||||
|
||||
* configurable precision, which decides on how to trade memory for accuracy,
|
||||
* excellent accuracy on low-cardinality sets,
|
||||
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
|
||||
|
||||
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
|
||||
|
||||
The following chart shows how the error varies before and after the threshold:
|
||||
|
||||

|
||||
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
|
||||
|
||||
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
|
||||
|
||||
The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is `3000`.
|
||||
|
||||
|
||||
## `MAX` [esql-max]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/max.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The maximum value of a field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| boolean | boolean |
|
||||
| date | date |
|
||||
| date_nanos | date_nanos |
|
||||
| double | double |
|
||||
| integer | integer |
|
||||
| ip | ip |
|
||||
| keyword | keyword |
|
||||
| long | long |
|
||||
| text | keyword |
|
||||
| version | version |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS MAX(languages)
|
||||
```
|
||||
|
||||
| MAX(languages):integer |
|
||||
| --- |
|
||||
| 5 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the maximum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MAX` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS max_avg_salary_change = MAX(MV_AVG(salary_change))
|
||||
```
|
||||
|
||||
| max_avg_salary_change:double |
|
||||
| --- |
|
||||
| 13.75 |
|
||||
|
||||
|
||||
## `MEDIAN` [esql-median]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/median.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](../esql-functions-operators.md#esql-percentile).
|
||||
|
||||
::::{note}
|
||||
Like [`PERCENTILE`](../esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](../esql-functions-operators.md#esql-percentile-approximate).
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS MEDIAN(salary), PERCENTILE(salary, 50)
|
||||
```
|
||||
|
||||
| MEDIAN(salary):double | PERCENTILE(salary, 50):double |
|
||||
| --- | --- |
|
||||
| 47003 | 47003 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the median of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS median_max_salary_change = MEDIAN(MV_MAX(salary_change))
|
||||
```
|
||||
|
||||
| median_max_salary_change:double |
|
||||
| --- |
|
||||
| 7.69 |
|
||||
|
||||
::::{warning}
|
||||
`MEDIAN` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## `MEDIAN_ABSOLUTE_DEVIATION` [esql-median_absolute_deviation]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/median_absolute_deviation.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation. It is calculated as the median of each data point’s deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`.
|
||||
|
||||
::::{note}
|
||||
Like [`PERCENTILE`](../esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](../esql-functions-operators.md#esql-percentile-approximate).
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS MEDIAN(salary), MEDIAN_ABSOLUTE_DEVIATION(salary)
|
||||
```
|
||||
|
||||
| MEDIAN(salary):double | MEDIAN_ABSOLUTE_DEVIATION(salary):double |
|
||||
| --- | --- |
|
||||
| 47003 | 10096.5 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the median absolute deviation of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN_ABSOLUTE_DEVIATION` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS m_a_d_max_salary_change = MEDIAN_ABSOLUTE_DEVIATION(MV_MAX(salary_change))
|
||||
```
|
||||
|
||||
| m_a_d_max_salary_change:double |
|
||||
| --- |
|
||||
| 5.69 |
|
||||
|
||||
::::{warning}
|
||||
`MEDIAN_ABSOLUTE_DEVIATION` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## `MIN` [esql-min]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/min.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The minimum value of a field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| boolean | boolean |
|
||||
| date | date |
|
||||
| date_nanos | date_nanos |
|
||||
| double | double |
|
||||
| integer | integer |
|
||||
| ip | ip |
|
||||
| keyword | keyword |
|
||||
| long | long |
|
||||
| text | keyword |
|
||||
| version | version |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS MIN(languages)
|
||||
```
|
||||
|
||||
| MIN(languages):integer |
|
||||
| --- |
|
||||
| 1 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the minimum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MIN` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS min_avg_salary_change = MIN(MV_AVG(salary_change))
|
||||
```
|
||||
|
||||
| min_avg_salary_change:double |
|
||||
| --- |
|
||||
| -8.46 |
|
||||
|
||||
|
||||
## `PERCENTILE` [esql-percentile]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/percentile.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the `MEDIAN`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | percentile | result |
|
||||
| --- | --- | --- |
|
||||
| double | double | double |
|
||||
| double | integer | double |
|
||||
| double | long | double |
|
||||
| integer | double | double |
|
||||
| integer | integer | double |
|
||||
| integer | long | double |
|
||||
| long | double | double |
|
||||
| long | integer | double |
|
||||
| long | long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS p0 = PERCENTILE(salary, 0)
|
||||
, p50 = PERCENTILE(salary, 50)
|
||||
, p99 = PERCENTILE(salary, 99)
|
||||
```
|
||||
|
||||
| p0:double | p50:double | p99:double |
|
||||
| --- | --- | --- |
|
||||
| 25324 | 47003 | 74970.29 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate a percentile of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `PERCENTILE` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80)
|
||||
```
|
||||
|
||||
| p80_max_salary_change:double |
|
||||
| --- |
|
||||
| 12.132 |
|
||||
|
||||
|
||||
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
|
||||
|
||||
There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
|
||||
|
||||
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
|
||||
|
||||
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
|
||||
|
||||
When using this metric, there are a few guidelines to keep in mind:
|
||||
|
||||
* Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
|
||||
* For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
|
||||
* As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated
|
||||
|
||||
The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
|
||||
|
||||

|
||||
|
||||
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
|
||||
|
||||
::::{warning}
|
||||
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm). This means you can get slightly different results using the same data.
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## `ST_CENTROID_AGG` [esql-st_centroid_agg]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/st_centroid_agg.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Calculate the spatial centroid over a field with spatial point geometry type.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| cartesian_point | cartesian_point |
|
||||
| geo_point | geo_point |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM airports
|
||||
| STATS centroid=ST_CENTROID_AGG(location)
|
||||
```
|
||||
|
||||
| centroid:geo_point |
|
||||
| --- |
|
||||
| POINT(-0.030548143003023033 24.37553649504829) |
|
||||
|
||||
|
||||
## `ST_EXTENT_AGG` [esql-st_extent_agg]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/st_extent_agg.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Calculate the spatial extent over a field with geometry type. Returns a bounding box for all values of the field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| cartesian_point | cartesian_shape |
|
||||
| cartesian_shape | cartesian_shape |
|
||||
| geo_point | geo_shape |
|
||||
| geo_shape | geo_shape |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM airports
|
||||
| WHERE country == "India"
|
||||
| STATS extent = ST_EXTENT_AGG(location)
|
||||
```
|
||||
|
||||
| extent:geo_shape |
|
||||
| --- |
|
||||
| BBOX (70.77995480038226, 91.5882289968431, 33.9830909203738, 8.47650992218405) |
|
||||
|
||||
|
||||
## `STD_DEV` [esql-std_dev]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/std_dev.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The standard deviation of a numeric field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS STD_DEV(height)
|
||||
```
|
||||
|
||||
| STD_DEV(height):double |
|
||||
| --- |
|
||||
| 0.20637044362020449 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the standard deviation of each employee’s maximum salary changes, first use `MV_MAX` on each row, and then use `STD_DEV` on the result
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS stddev_salary_change = STD_DEV(MV_MAX(salary_change))
|
||||
```
|
||||
|
||||
| stddev_salary_change:double |
|
||||
| --- |
|
||||
| 6.875829592924112 |
|
||||
|
||||
|
||||
## `SUM` [esql-sum]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/sum.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The sum of a numeric expression.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | long |
|
||||
| long | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS SUM(languages)
|
||||
```
|
||||
|
||||
| SUM(languages):long |
|
||||
| --- |
|
||||
| 281 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the sum of each employee’s maximum salary changes, apply the `MV_MAX` function to each row and then sum the results
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS total_salary_changes = SUM(MV_MAX(salary_change))
|
||||
```
|
||||
|
||||
| total_salary_changes:double |
|
||||
| --- |
|
||||
| 446.75 |
|
||||
|
||||
|
||||
## `TOP` [esql-top]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/top.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: The field to collect the top values for.
|
||||
|
||||
`limit`
|
||||
: The maximum number of values to collect.
|
||||
|
||||
`order`
|
||||
: The order to calculate the top values. Either `asc` or `desc`.
|
||||
|
||||
**Description**
|
||||
|
||||
Collects the top values for a field. Includes repeated values.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | limit | order | result |
|
||||
| --- | --- | --- | --- |
|
||||
| boolean | integer | keyword | boolean |
|
||||
| date | integer | keyword | date |
|
||||
| double | integer | keyword | double |
|
||||
| integer | integer | keyword | integer |
|
||||
| ip | integer | keyword | ip |
|
||||
| keyword | integer | keyword | keyword |
|
||||
| long | integer | keyword | long |
|
||||
| text | integer | keyword | keyword |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS top_salaries = TOP(salary, 3, "desc"), top_salary = MAX(salary)
|
||||
```
|
||||
|
||||
| top_salaries:integer | top_salary:integer |
|
||||
| --- | --- |
|
||||
| [74999, 74970, 74572] | 74999 |
|
||||
|
||||
|
||||
## `VALUES` [esql-values]
|
||||
|
||||
::::{warning}
|
||||
Do not use on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
|
||||
::::
|
||||
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/values.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
Returns all values in a group as a multivalued field. The order of the returned values isn’t guaranteed. If you need the values returned in order use [`MV_SORT`](../esql-functions-operators.md#esql-mv_sort).
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| boolean | boolean |
|
||||
| date | date |
|
||||
| date_nanos | date_nanos |
|
||||
| double | double |
|
||||
| integer | integer |
|
||||
| ip | ip |
|
||||
| keyword | keyword |
|
||||
| long | long |
|
||||
| text | keyword |
|
||||
| version | version |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| EVAL first_letter = SUBSTRING(first_name, 0, 1)
|
||||
| STATS first_name=MV_SORT(VALUES(first_name)) BY first_letter
|
||||
| SORT first_letter
|
||||
```
|
||||
|
||||
| first_name:keyword | first_letter:keyword |
|
||||
| --- | --- |
|
||||
| [Alejandro, Amabile, Anneke, Anoosh, Arumugam] | A |
|
||||
| [Basil, Berhard, Berni, Bezalel, Bojan, Breannda, Brendon] | B |
|
||||
| [Charlene, Chirstian, Claudi, Cristinel] | C |
|
||||
| [Danel, Divier, Domenick, Duangkaew] | D |
|
||||
| [Ebbe, Eberhardt, Erez] | E |
|
||||
| Florian | F |
|
||||
| [Gao, Georgi, Georgy, Gino, Guoxiang] | G |
|
||||
| [Heping, Hidefumi, Hilari, Hironobu, Hironoby, Hisao] | H |
|
||||
| [Jayson, Jungsoon] | J |
|
||||
| [Kazuhide, Kazuhito, Kendra, Kenroku, Kshitij, Kwee, Kyoichi] | K |
|
||||
| [Lillian, Lucien] | L |
|
||||
| [Magy, Margareta, Mary, Mayuko, Mayumi, Mingsen, Mokhtar, Mona, Moss] | M |
|
||||
| Otmar | O |
|
||||
| [Parto, Parviz, Patricio, Prasadram, Premal] | P |
|
||||
| [Ramzi, Remzi, Reuven] | R |
|
||||
| [Sailaja, Saniya, Sanjiv, Satosi, Shahaf, Shir, Somnath, Sreekrishna, Sudharsan, Sumant, Suzette] | S |
|
||||
| [Tse, Tuval, Tzvetan] | T |
|
||||
| [Udi, Uri] | U |
|
||||
| [Valdiodio, Valter, Vishv] | V |
|
||||
| Weiyi | W |
|
||||
| Xinglin | X |
|
||||
| [Yinghua, Yishay, Yongqiao] | Y |
|
||||
| [Zhongwei, Zvonko] | Z |
|
||||
| null | null |
|
||||
|
||||
::::{warning}
|
||||
This can use a significant amount of memory and ES|QL doesn’t yet grow aggregations beyond memory. So this aggregation will work until it is used to collect more values than can fit into memory. Once it collects too many values it will fail the query with a [Circuit Breaker Error](docs-content://troubleshoot/elasticsearch/circuit-breaker-errors.md).
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
## `WEIGHTED_AVG` [esql-weighted_avg]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/weighted_avg.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: A numeric value.
|
||||
|
||||
`weight`
|
||||
: A numeric weight.
|
||||
|
||||
**Description**
|
||||
|
||||
The weighted average of a numeric expression.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | weight | result |
|
||||
| --- | --- | --- |
|
||||
| double | double | double |
|
||||
| double | integer | double |
|
||||
| double | long | double |
|
||||
| integer | double | double |
|
||||
| integer | integer | double |
|
||||
| integer | long | double |
|
||||
| long | double | double |
|
||||
| long | integer | double |
|
||||
| long | long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS w_avg = WEIGHTED_AVG(salary, height) by languages
|
||||
| EVAL w_avg = ROUND(w_avg)
|
||||
| KEEP w_avg, languages
|
||||
| SORT languages
|
||||
```
|
||||
|
||||
| w_avg:double | languages:integer |
|
||||
| --- | --- |
|
||||
| 51464.0 | 1 |
|
||||
| 48477.0 | 2 |
|
||||
| 52379.0 | 3 |
|
||||
| 47990.0 | 4 |
|
||||
| 42119.0 | 5 |
|
||||
| 52142.0 | null |
|
|
@ -1,21 +0,0 @@
|
|||
## {{esql}} conditional functions and expressions [esql-conditional-functions-and-expressions]
|
||||
|
||||
|
||||
Conditional functions return one of their arguments by evaluating in an if-else manner. {{esql}} supports these conditional functions:
|
||||
|
||||
:::{include} lists/conditional-functions-and-expressions.md
|
||||
:::
|
||||
|
||||
|
||||
:::{include} functions/case.md
|
||||
:::
|
||||
|
||||
:::{include} functions/coalesce.md
|
||||
:::
|
||||
|
||||
:::{include} functions/greatest.md
|
||||
:::
|
||||
|
||||
:::{include} functions/least.md
|
||||
:::
|
||||
|
|
@ -1,287 +0,0 @@
|
|||
## {{esql}} conditional functions and expressions [esql-conditional-functions-and-expressions]
|
||||
|
||||
|
||||
Conditional functions return one of their arguments by evaluating in an if-else manner. {{esql}} supports these conditional functions:
|
||||
|
||||
:::{include} lists/conditional-functions-and-expressions.md
|
||||
:::
|
||||
|
||||
|
||||
## `CASE` [esql-case]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/case.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`condition`
|
||||
: A condition.
|
||||
|
||||
`trueValue`
|
||||
: The value that’s returned when the corresponding condition is the first to evaluate to `true`. The default value is returned when no condition matches.
|
||||
|
||||
`elseValue`
|
||||
: The value that’s returned when no condition evaluates to `true`.
|
||||
|
||||
**Description**
|
||||
|
||||
Accepts pairs of conditions and values. The function returns the value that belongs to the first condition that evaluates to `true`. If the number of arguments is odd, the last argument is the default value which is returned when no condition matches. If the number of arguments is even, and no condition matches, the function returns `null`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| condition | trueValue | elseValue | result |
|
||||
| --- | --- | --- | --- |
|
||||
| boolean | boolean | boolean | boolean |
|
||||
| boolean | boolean | | boolean |
|
||||
| boolean | cartesian_point | cartesian_point | cartesian_point |
|
||||
| boolean | cartesian_point | | cartesian_point |
|
||||
| boolean | cartesian_shape | cartesian_shape | cartesian_shape |
|
||||
| boolean | cartesian_shape | | cartesian_shape |
|
||||
| boolean | date | date | date |
|
||||
| boolean | date | | date |
|
||||
| boolean | date_nanos | date_nanos | date_nanos |
|
||||
| boolean | date_nanos | | date_nanos |
|
||||
| boolean | double | double | double |
|
||||
| boolean | double | | double |
|
||||
| boolean | geo_point | geo_point | geo_point |
|
||||
| boolean | geo_point | | geo_point |
|
||||
| boolean | geo_shape | geo_shape | geo_shape |
|
||||
| boolean | geo_shape | | geo_shape |
|
||||
| boolean | integer | integer | integer |
|
||||
| boolean | integer | | integer |
|
||||
| boolean | ip | ip | ip |
|
||||
| boolean | ip | | ip |
|
||||
| boolean | keyword | keyword | keyword |
|
||||
| boolean | keyword | text | keyword |
|
||||
| boolean | keyword | | keyword |
|
||||
| boolean | long | long | long |
|
||||
| boolean | long | | long |
|
||||
| boolean | text | keyword | keyword |
|
||||
| boolean | text | text | keyword |
|
||||
| boolean | text | | keyword |
|
||||
| boolean | unsigned_long | unsigned_long | unsigned_long |
|
||||
| boolean | unsigned_long | | unsigned_long |
|
||||
| boolean | version | version | version |
|
||||
| boolean | version | | version |
|
||||
|
||||
**Examples**
|
||||
|
||||
Determine whether employees are monolingual, bilingual, or polyglot:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| EVAL type = CASE(
|
||||
languages <= 1, "monolingual",
|
||||
languages <= 2, "bilingual",
|
||||
"polyglot")
|
||||
| KEEP emp_no, languages, type
|
||||
```
|
||||
|
||||
| emp_no:integer | languages:integer | type:keyword |
|
||||
| --- | --- | --- |
|
||||
| 10001 | 2 | bilingual |
|
||||
| 10002 | 5 | polyglot |
|
||||
| 10003 | 4 | polyglot |
|
||||
| 10004 | 5 | polyglot |
|
||||
| 10005 | 1 | monolingual |
|
||||
|
||||
Calculate the total connection success rate based on log messages:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL successful = CASE(
|
||||
STARTS_WITH(message, "Connected to"), 1,
|
||||
message == "Connection error", 0
|
||||
)
|
||||
| STATS success_rate = AVG(successful)
|
||||
```
|
||||
|
||||
| success_rate:double |
|
||||
| --- |
|
||||
| 0.5 |
|
||||
|
||||
Calculate an hourly error rate as a percentage of the total number of log messages:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL error = CASE(message LIKE "*error*", 1, 0)
|
||||
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
|
||||
| STATS error_rate = AVG(error) by hour
|
||||
| SORT hour
|
||||
```
|
||||
|
||||
| error_rate:double | hour:date |
|
||||
| --- | --- |
|
||||
| 0.0 | 2023-10-23T12:00:00.000Z |
|
||||
| 0.6 | 2023-10-23T13:00:00.000Z |
|
||||
|
||||
|
||||
## `COALESCE` [esql-coalesce]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/coalesce.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`first`
|
||||
: Expression to evaluate.
|
||||
|
||||
`rest`
|
||||
: Other expression to evaluate.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the first of its arguments that is not null. If all arguments are null, it returns `null`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| first | rest | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | boolean | boolean |
|
||||
| boolean | | boolean |
|
||||
| cartesian_point | cartesian_point | cartesian_point |
|
||||
| cartesian_shape | cartesian_shape | cartesian_shape |
|
||||
| date | date | date |
|
||||
| date_nanos | date_nanos | date_nanos |
|
||||
| geo_point | geo_point | geo_point |
|
||||
| geo_shape | geo_shape | geo_shape |
|
||||
| integer | integer | integer |
|
||||
| integer | | integer |
|
||||
| ip | ip | ip |
|
||||
| keyword | keyword | keyword |
|
||||
| keyword | | keyword |
|
||||
| long | long | long |
|
||||
| long | | long |
|
||||
| text | text | keyword |
|
||||
| text | | keyword |
|
||||
| version | version | version |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=null, b="b"
|
||||
| EVAL COALESCE(a, b)
|
||||
```
|
||||
|
||||
| a:null | b:keyword | COALESCE(a, b):keyword |
|
||||
| --- | --- | --- |
|
||||
| null | b | b |
|
||||
|
||||
|
||||
## `GREATEST` [esql-greatest]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/greatest.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`first`
|
||||
: First of the columns to evaluate.
|
||||
|
||||
`rest`
|
||||
: The rest of the columns to evaluate.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](../esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once.
|
||||
|
||||
::::{note}
|
||||
When run on `keyword` or `text` fields, this returns the last string in alphabetical order. When run on `boolean` columns this will return `true` if any values are `true`.
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| first | rest | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | boolean | boolean |
|
||||
| boolean | | boolean |
|
||||
| date | date | date |
|
||||
| date_nanos | date_nanos | date_nanos |
|
||||
| double | double | double |
|
||||
| integer | integer | integer |
|
||||
| integer | | integer |
|
||||
| ip | ip | ip |
|
||||
| keyword | keyword | keyword |
|
||||
| keyword | | keyword |
|
||||
| long | long | long |
|
||||
| long | | long |
|
||||
| text | text | keyword |
|
||||
| text | | keyword |
|
||||
| version | version | version |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a = 10, b = 20
|
||||
| EVAL g = GREATEST(a, b)
|
||||
```
|
||||
|
||||
| a:integer | b:integer | g:integer |
|
||||
| --- | --- | --- |
|
||||
| 10 | 20 | 20 |
|
||||
|
||||
|
||||
## `LEAST` [esql-least]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/least.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`first`
|
||||
: First of the columns to evaluate.
|
||||
|
||||
`rest`
|
||||
: The rest of the columns to evaluate.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](../esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| first | rest | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | boolean | boolean |
|
||||
| boolean | | boolean |
|
||||
| date | date | date |
|
||||
| date_nanos | date_nanos | date_nanos |
|
||||
| double | double | double |
|
||||
| integer | integer | integer |
|
||||
| integer | | integer |
|
||||
| ip | ip | ip |
|
||||
| keyword | keyword | keyword |
|
||||
| keyword | | keyword |
|
||||
| long | long | long |
|
||||
| long | | long |
|
||||
| text | text | keyword |
|
||||
| text | | keyword |
|
||||
| version | version | version |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a = 10, b = 20
|
||||
| EVAL l = LEAST(a, b)
|
||||
```
|
||||
|
||||
| a:integer | b:integer | l:integer |
|
||||
| --- | --- | --- |
|
||||
| 10 | 20 | 10 |
|
|
@ -1,27 +0,0 @@
|
|||
## {{esql}} date-time functions [esql-date-time-functions]
|
||||
|
||||
|
||||
{{esql}} supports these date-time functions:
|
||||
|
||||
:::{include} lists/date-time-functions.md
|
||||
:::
|
||||
|
||||
|
||||
:::{include} functions/date_diff.md
|
||||
:::
|
||||
|
||||
:::{include} functions/date_extract.md
|
||||
:::
|
||||
|
||||
:::{include} functions/date_format.md
|
||||
:::
|
||||
|
||||
:::{include} functions/date_parse.md
|
||||
:::
|
||||
|
||||
:::{include} functions/date_trunc.md
|
||||
:::
|
||||
|
||||
:::{include} functions/now.md
|
||||
:::
|
||||
|
|
@ -1,359 +0,0 @@
|
|||
## {{esql}} date-time functions [esql-date-time-functions]
|
||||
|
||||
|
||||
{{esql}} supports these date-time functions:
|
||||
|
||||
:::{include} lists/date-time-functions.md
|
||||
:::
|
||||
|
||||
|
||||
## `DATE_DIFF` [esql-date_diff]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/date_diff.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`unit`
|
||||
: Time difference unit
|
||||
|
||||
`startTimestamp`
|
||||
: A string representing a start timestamp
|
||||
|
||||
`endTimestamp`
|
||||
: A string representing an end timestamp
|
||||
|
||||
**Description**
|
||||
|
||||
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
|
||||
|
||||
| Datetime difference units |
|
||||
| --- |
|
||||
| **unit** | **abbreviations** |
|
||||
| year | years, yy, yyyy |
|
||||
| quarter | quarters, qq, q |
|
||||
| month | months, mm, m |
|
||||
| dayofyear | dy, y |
|
||||
| day | days, dd, d |
|
||||
| week | weeks, wk, ww |
|
||||
| weekday | weekdays, dw |
|
||||
| hour | hours, hh |
|
||||
| minute | minutes, mi, n |
|
||||
| second | seconds, ss, s |
|
||||
| millisecond | milliseconds, ms |
|
||||
| microsecond | microseconds, mcs |
|
||||
| nanosecond | nanoseconds, ns |
|
||||
|
||||
Note that while there is an overlap between the function’s supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| unit | startTimestamp | endTimestamp | result |
|
||||
| --- | --- | --- | --- |
|
||||
| keyword | date | date | integer |
|
||||
| keyword | date | date_nanos | integer |
|
||||
| keyword | date_nanos | date | integer |
|
||||
| keyword | date_nanos | date_nanos | integer |
|
||||
| text | date | date | integer |
|
||||
| text | date | date_nanos | integer |
|
||||
| text | date_nanos | date | integer |
|
||||
| text | date_nanos | date_nanos | integer |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z")
|
||||
| EVAL dd_ms = DATE_DIFF("microseconds", date1, date2)
|
||||
```
|
||||
|
||||
| date1:date | date2:date | dd_ms:integer |
|
||||
| --- | --- | --- |
|
||||
| 2023-12-02T11:00:00.000Z | 2023-12-02T11:00:00.001Z | 1000 |
|
||||
|
||||
When subtracting in calendar units - like year, month a.s.o. - only the fully elapsed units are counted. To avoid this and obtain also remainders, simply switch to the next smaller unit and do the date math accordingly.
|
||||
|
||||
```esql
|
||||
ROW end_23=TO_DATETIME("2023-12-31T23:59:59.999Z"),
|
||||
start_24=TO_DATETIME("2024-01-01T00:00:00.000Z"),
|
||||
end_24=TO_DATETIME("2024-12-31T23:59:59.999")
|
||||
| EVAL end23_to_start24=DATE_DIFF("year", end_23, start_24)
|
||||
| EVAL end23_to_end24=DATE_DIFF("year", end_23, end_24)
|
||||
| EVAL start_to_end_24=DATE_DIFF("year", start_24, end_24)
|
||||
```
|
||||
|
||||
| end_23:date | start_24:date | end_24:date | end23_to_start24:integer | end23_to_end24:integer | start_to_end_24:integer |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 2023-12-31T23:59:59.999Z | 2024-01-01T00:00:00.000Z | 2024-12-31T23:59:59.999Z | 0 | 1 | 0 |
|
||||
|
||||
|
||||
## `DATE_EXTRACT` [esql-date_extract]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/date_extract.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`datePart`
|
||||
: Part of the date to extract. Can be: `aligned_day_of_week_in_month`, `aligned_day_of_week_in_year`, `aligned_week_of_month`, `aligned_week_of_year`, `ampm_of_day`, `clock_hour_of_ampm`, `clock_hour_of_day`, `day_of_month`, `day_of_week`, `day_of_year`, `epoch_day`, `era`, `hour_of_ampm`, `hour_of_day`, `instant_seconds`, `micro_of_day`, `micro_of_second`, `milli_of_day`, `milli_of_second`, `minute_of_day`, `minute_of_hour`, `month_of_year`, `nano_of_day`, `nano_of_second`, `offset_seconds`, `proleptic_month`, `second_of_day`, `second_of_minute`, `year`, or `year_of_era`. Refer to [java.time.temporal.ChronoField](https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.md) for a description of these values. If `null`, the function returns `null`.
|
||||
|
||||
`date`
|
||||
: Date expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Extracts parts of a date, like year, month, day, hour.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| datePart | date | result |
|
||||
| --- | --- | --- |
|
||||
| keyword | date | long |
|
||||
| keyword | date_nanos | long |
|
||||
| text | date | long |
|
||||
| text | date_nanos | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW date = DATE_PARSE("yyyy-MM-dd", "2022-05-06")
|
||||
| EVAL year = DATE_EXTRACT("year", date)
|
||||
```
|
||||
|
||||
| date:date | year:long |
|
||||
| --- | --- |
|
||||
| 2022-05-06T00:00:00.000Z | 2022 |
|
||||
|
||||
Find all events that occurred outside of business hours (before 9 AM or after 5PM), on any given date:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| WHERE DATE_EXTRACT("hour_of_day", @timestamp) < 9 AND DATE_EXTRACT("hour_of_day", @timestamp) >= 17
|
||||
```
|
||||
|
||||
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
|
||||
| --- | --- | --- | --- |
|
||||
|
||||
|
||||
## `DATE_FORMAT` [esql-date_format]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/date_format.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`dateFormat`
|
||||
: Date format (optional). If no format is specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used. If `null`, the function returns `null`.
|
||||
|
||||
`date`
|
||||
: Date expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns a string representation of a date, in the provided format.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| dateFormat | date | result |
|
||||
| --- | --- | --- |
|
||||
| date | | keyword |
|
||||
| date_nanos | | keyword |
|
||||
| keyword | date | keyword |
|
||||
| keyword | date_nanos | keyword |
|
||||
| text | date | keyword |
|
||||
| text | date_nanos | keyword |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name, hire_date
|
||||
| EVAL hired = DATE_FORMAT("yyyy-MM-dd", hire_date)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | hire_date:date | hired:keyword |
|
||||
| --- | --- | --- | --- |
|
||||
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-06-26 |
|
||||
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-11-18 |
|
||||
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-06-02 |
|
||||
|
||||
|
||||
## `DATE_PARSE` [esql-date_parse]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/date_parse.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`datePattern`
|
||||
: The date format. Refer to the [`DateTimeFormatter` documentation](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/time/format/DateTimeFormatter.md) for the syntax. If `null`, the function returns `null`.
|
||||
|
||||
`dateString`
|
||||
: Date expression as a string. If `null` or an empty string, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns a date by parsing the second argument using the format specified in the first argument.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| datePattern | dateString | result |
|
||||
| --- | --- | --- |
|
||||
| keyword | keyword | date |
|
||||
| keyword | text | date |
|
||||
| text | keyword | date |
|
||||
| text | text | date |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW date_string = "2022-05-06"
|
||||
| EVAL date = DATE_PARSE("yyyy-MM-dd", date_string)
|
||||
```
|
||||
|
||||
| date_string:keyword | date:date |
|
||||
| --- | --- |
|
||||
| 2022-05-06 | 2022-05-06T00:00:00.000Z |
|
||||
|
||||
|
||||
## `DATE_TRUNC` [esql-date_trunc]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/date_trunc.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`interval`
|
||||
: Interval; expressed using the timespan literal syntax.
|
||||
|
||||
`date`
|
||||
: Date expression
|
||||
|
||||
**Description**
|
||||
|
||||
Rounds down a date to the closest interval.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| interval | date | result |
|
||||
| --- | --- | --- |
|
||||
| date_period | date | date |
|
||||
| date_period | date_nanos | date_nanos |
|
||||
| time_duration | date | date |
|
||||
| time_duration | date_nanos | date_nanos |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name, hire_date
|
||||
| EVAL year_hired = DATE_TRUNC(1 year, hire_date)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | hire_date:date | year_hired:date |
|
||||
| --- | --- | --- | --- |
|
||||
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-01-01T00:00:00.000Z |
|
||||
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-01-01T00:00:00.000Z |
|
||||
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-01-01T00:00:00.000Z |
|
||||
|
||||
Combine `DATE_TRUNC` with [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) to create date histograms. For example, the number of hires per year:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| EVAL year = DATE_TRUNC(1 year, hire_date)
|
||||
| STATS hires = COUNT(emp_no) BY year
|
||||
| SORT year
|
||||
```
|
||||
|
||||
| hires:long | year:date |
|
||||
| --- | --- |
|
||||
| 11 | 1985-01-01T00:00:00.000Z |
|
||||
| 11 | 1986-01-01T00:00:00.000Z |
|
||||
| 15 | 1987-01-01T00:00:00.000Z |
|
||||
| 9 | 1988-01-01T00:00:00.000Z |
|
||||
| 13 | 1989-01-01T00:00:00.000Z |
|
||||
| 12 | 1990-01-01T00:00:00.000Z |
|
||||
| 6 | 1991-01-01T00:00:00.000Z |
|
||||
| 8 | 1992-01-01T00:00:00.000Z |
|
||||
| 3 | 1993-01-01T00:00:00.000Z |
|
||||
| 4 | 1994-01-01T00:00:00.000Z |
|
||||
| 5 | 1995-01-01T00:00:00.000Z |
|
||||
| 1 | 1996-01-01T00:00:00.000Z |
|
||||
| 1 | 1997-01-01T00:00:00.000Z |
|
||||
| 1 | 1999-01-01T00:00:00.000Z |
|
||||
|
||||
Or an hourly error rate:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL error = CASE(message LIKE "*error*", 1, 0)
|
||||
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
|
||||
| STATS error_rate = AVG(error) by hour
|
||||
| SORT hour
|
||||
```
|
||||
|
||||
| error_rate:double | hour:date |
|
||||
| --- | --- |
|
||||
| 0.0 | 2023-10-23T12:00:00.000Z |
|
||||
| 0.6 | 2023-10-23T13:00:00.000Z |
|
||||
|
||||
|
||||
## `NOW` [esql-now]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../images/now.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
**Description**
|
||||
|
||||
Returns current date and time.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| result |
|
||||
| --- |
|
||||
| date |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW current_date = NOW()
|
||||
```
|
||||
|
||||
| y:keyword |
|
||||
| --- |
|
||||
| 20 |
|
||||
|
||||
To retrieve logs from the last hour:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| WHERE @timestamp > NOW() - 1 hour
|
||||
```
|
||||
|
||||
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
|
||||
| --- | --- | --- | --- |
|
|
@ -1,51 +0,0 @@
|
|||
## `ABS` [esql-abs]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/abs.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Numeric expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the absolute value.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | integer |
|
||||
| long | long |
|
||||
| unsigned_long | unsigned_long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW number = -1.0
|
||||
| EVAL abs_number = ABS(number)
|
||||
```
|
||||
|
||||
| number:double | abs_number:double |
|
||||
| --- | --- |
|
||||
| -1.0 | 1.0 |
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name, height
|
||||
| EVAL abs_height = ABS(0.0 - height)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | height:double | abs_height:double |
|
||||
| --- | --- | --- | --- |
|
||||
| Alejandro | McAlpine | 1.48 | 1.48 |
|
||||
| Amabile | Gomatam | 2.09 | 2.09 |
|
||||
| Anneke | Preusig | 1.56 | 1.56 |
|
||||
|
||||
|
|
@ -1,39 +0,0 @@
|
|||
## `ACOS` [esql-acos]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/acos.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Number between -1 and 1. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the [arccosine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of `n` as an angle, expressed in radians.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=.9
|
||||
| EVAL acos=ACOS(a)
|
||||
```
|
||||
|
||||
| a:double | acos:double |
|
||||
| --- | --- |
|
||||
| .9 | 0.45102681179626236 |
|
||||
|
||||
|
24
docs/reference/query-languages/esql/_snippets/functions/appendix/count_distinct.md
generated
Normal file
24
docs/reference/query-languages/esql/_snippets/functions/appendix/count_distinct.md
generated
Normal file
|
@ -0,0 +1,24 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
### Counts are approximate [esql-agg-count-distinct-approximate]
|
||||
|
||||
Computing exact counts requires loading values into a set and returning its
|
||||
size. This doesn’t scale when working on high-cardinality sets and/or large
|
||||
values as the required memory usage and the need to communicate those
|
||||
per-shard sets between nodes would utilize too many resources of the cluster.
|
||||
|
||||
This `COUNT_DISTINCT` function is based on the
|
||||
[HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf)
|
||||
algorithm, which counts based on the hashes of the values with some interesting
|
||||
properties:
|
||||
|
||||
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
|
||||
:::
|
||||
|
||||
The `COUNT_DISTINCT` function takes an optional second parameter to configure
|
||||
the precision threshold. The `precision_threshold` options allows to trade memory
|
||||
for accuracy, and defines a unique count below which counts are expected to be
|
||||
close to accurate. Above this value, counts might become a bit more fuzzy. The
|
||||
maximum supported value is `40000`, thresholds above this number will have the
|
||||
same effect as a threshold of `40000`. The default value is `3000`.
|
||||
|
|
@ -0,0 +1,6 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
::::{warning}
|
||||
`MEDIAN` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
|
||||
This means you can get slightly different results using the same data.
|
||||
::::
|
6
docs/reference/query-languages/esql/_snippets/functions/appendix/median_absolute_deviation.md
generated
Normal file
6
docs/reference/query-languages/esql/_snippets/functions/appendix/median_absolute_deviation.md
generated
Normal file
|
@ -0,0 +1,6 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
::::{warning}
|
||||
`MEDIAN_ABSOLUTE_DEVIATION` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
|
||||
This means you can get slightly different results using the same data.
|
||||
::::
|
11
docs/reference/query-languages/esql/_snippets/functions/appendix/percentile.md
generated
Normal file
11
docs/reference/query-languages/esql/_snippets/functions/appendix/percentile.md
generated
Normal file
|
@ -0,0 +1,11 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
### `PERCENTILE` is (usually) approximate [esql-percentile-approximate]
|
||||
|
||||
:::{include} /reference/data-analysis/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md
|
||||
:::
|
||||
|
||||
::::{warning}
|
||||
`PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
|
||||
This means you can get slightly different results using the same data.
|
||||
::::
|
|
@ -0,0 +1,9 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
::::{warning}
|
||||
This can use a significant amount of memory and ES|QL doesn’t yet
|
||||
grow aggregations beyond memory. So this aggregation will work until
|
||||
it is used to collect more values than can fit into memory. Once it
|
||||
collects too many values it will fail the query with
|
||||
a [Circuit Breaker Error](docs-content://troubleshoot/elasticsearch/circuit-breaker-errors.md).
|
||||
::::
|
|
@ -1,39 +0,0 @@
|
|||
## `ASIN` [esql-asin]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/asin.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Number between -1 and 1. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the [arcsine](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of the input numeric expression as an angle, expressed in radians.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=.9
|
||||
| EVAL asin=ASIN(a)
|
||||
```
|
||||
|
||||
| a:double | asin:double |
|
||||
| --- | --- |
|
||||
| .9 | 1.1197695149986342 |
|
||||
|
||||
|
|
@ -1,39 +0,0 @@
|
|||
## `ATAN` [esql-atan]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/atan.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Numeric expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the [arctangent](https://en.wikipedia.org/wiki/Inverse_trigonometric_functions) of the input numeric expression as an angle, expressed in radians.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=12.9
|
||||
| EVAL atan=ATAN(a)
|
||||
```
|
||||
|
||||
| a:double | atan:double |
|
||||
| --- | --- |
|
||||
| 12.9 | 1.4934316673669235 |
|
||||
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
## `ATAN2` [esql-atan2]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/atan2.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`y_coordinate`
|
||||
: y coordinate. If `null`, the function returns `null`.
|
||||
|
||||
`x_coordinate`
|
||||
: x coordinate. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
The [angle](https://en.wikipedia.org/wiki/Atan2) between the positive x-axis and the ray from the origin to the point (x , y) in the Cartesian plane, expressed in radians.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| y_coordinate | x_coordinate | result |
|
||||
| --- | --- | --- |
|
||||
| double | double | double |
|
||||
| double | integer | double |
|
||||
| double | long | double |
|
||||
| double | unsigned_long | double |
|
||||
| integer | double | double |
|
||||
| integer | integer | double |
|
||||
| integer | long | double |
|
||||
| integer | unsigned_long | double |
|
||||
| long | double | double |
|
||||
| long | integer | double |
|
||||
| long | long | double |
|
||||
| long | unsigned_long | double |
|
||||
| unsigned_long | double | double |
|
||||
| unsigned_long | integer | double |
|
||||
| unsigned_long | long | double |
|
||||
| unsigned_long | unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW y=12.9, x=.6
|
||||
| EVAL atan2=ATAN2(y, x)
|
||||
```
|
||||
|
||||
| y:double | x:double | atan2:double |
|
||||
| --- | --- | --- |
|
||||
| 12.9 | 0.6 | 1.5243181954438936 |
|
||||
|
||||
|
|
@ -1,47 +0,0 @@
|
|||
## `AVG` [esql-avg]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/avg.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
true
|
||||
**Description**
|
||||
|
||||
The average of a numeric field.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS AVG(height)
|
||||
```
|
||||
|
||||
| AVG(height):double |
|
||||
| --- |
|
||||
| 1.7682 |
|
||||
|
||||
The expression can use inline functions. For example, to calculate the average over a multivalued column, first use `MV_AVG` to average the multiple values per row, and use the result with the `AVG` function
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
|
||||
```
|
||||
|
||||
| avg_salary_change:double |
|
||||
| --- |
|
||||
| 1.3904535865 |
|
||||
|
||||
|
|
@ -1,46 +0,0 @@
|
|||
## `BIT_LENGTH` [esql-bit_length]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/bit_length.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`string`
|
||||
: String expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the bit length of a string.
|
||||
|
||||
::::{note}
|
||||
All strings are in UTF-8, so a single character can use multiple bytes.
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| string | result |
|
||||
| --- | --- |
|
||||
| keyword | integer |
|
||||
| text | integer |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM airports
|
||||
| WHERE country == "India"
|
||||
| KEEP city
|
||||
| EVAL fn_length = LENGTH(city), fn_bit_length = BIT_LENGTH(city)
|
||||
```
|
||||
|
||||
| city:keyword | fn_length:integer | fn_bit_length:integer |
|
||||
| --- | --- | --- |
|
||||
| Agwār | 5 | 48 |
|
||||
| Ahmedabad | 9 | 72 |
|
||||
| Bangalore | 9 | 72 |
|
||||
|
||||
|
|
@ -1,296 +0,0 @@
|
|||
## `BUCKET` [esql-bucket]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/bucket.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Numeric or date expression from which to derive buckets.
|
||||
|
||||
`buckets`
|
||||
: Target number of buckets, or desired bucket size if `from` and `to` parameters are omitted.
|
||||
|
||||
`from`
|
||||
: Start of the range. Can be a number, a date or a date expressed as a string.
|
||||
|
||||
`to`
|
||||
: End of the range. Can be a number, a date or a date expressed as a string.
|
||||
|
||||
**Description**
|
||||
|
||||
Creates groups of values - buckets - out of a datetime or numeric input. The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | buckets | from | to | result |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| date | date_period | | | date |
|
||||
| date | integer | date | date | date |
|
||||
| date | integer | date | keyword | date |
|
||||
| date | integer | date | text | date |
|
||||
| date | integer | keyword | date | date |
|
||||
| date | integer | keyword | keyword | date |
|
||||
| date | integer | keyword | text | date |
|
||||
| date | integer | text | date | date |
|
||||
| date | integer | text | keyword | date |
|
||||
| date | integer | text | text | date |
|
||||
| date | time_duration | | | date |
|
||||
| date_nanos | date_period | | | date_nanos |
|
||||
| date_nanos | integer | date | date | date_nanos |
|
||||
| date_nanos | integer | date | keyword | date_nanos |
|
||||
| date_nanos | integer | date | text | date_nanos |
|
||||
| date_nanos | integer | keyword | date | date_nanos |
|
||||
| date_nanos | integer | keyword | keyword | date_nanos |
|
||||
| date_nanos | integer | keyword | text | date_nanos |
|
||||
| date_nanos | integer | text | date | date_nanos |
|
||||
| date_nanos | integer | text | keyword | date_nanos |
|
||||
| date_nanos | integer | text | text | date_nanos |
|
||||
| date_nanos | time_duration | | | date_nanos |
|
||||
| double | double | | | double |
|
||||
| double | integer | double | double | double |
|
||||
| double | integer | double | integer | double |
|
||||
| double | integer | double | long | double |
|
||||
| double | integer | integer | double | double |
|
||||
| double | integer | integer | integer | double |
|
||||
| double | integer | integer | long | double |
|
||||
| double | integer | long | double | double |
|
||||
| double | integer | long | integer | double |
|
||||
| double | integer | long | long | double |
|
||||
| double | integer | | | double |
|
||||
| double | long | | | double |
|
||||
| integer | double | | | double |
|
||||
| integer | integer | double | double | double |
|
||||
| integer | integer | double | integer | double |
|
||||
| integer | integer | double | long | double |
|
||||
| integer | integer | integer | double | double |
|
||||
| integer | integer | integer | integer | double |
|
||||
| integer | integer | integer | long | double |
|
||||
| integer | integer | long | double | double |
|
||||
| integer | integer | long | integer | double |
|
||||
| integer | integer | long | long | double |
|
||||
| integer | integer | | | double |
|
||||
| integer | long | | | double |
|
||||
| long | double | | | double |
|
||||
| long | integer | double | double | double |
|
||||
| long | integer | double | integer | double |
|
||||
| long | integer | double | long | double |
|
||||
| long | integer | integer | double | double |
|
||||
| long | integer | integer | integer | double |
|
||||
| long | integer | integer | long | double |
|
||||
| long | integer | long | double | double |
|
||||
| long | integer | long | integer | double |
|
||||
| long | integer | long | long | double |
|
||||
| long | integer | | | double |
|
||||
| long | long | | | double |
|
||||
|
||||
**Examples**
|
||||
|
||||
`BUCKET` can work in two modes: one in which the size of the bucket is computed based on a buckets count recommendation (four parameters) and a range, and another in which the bucket size is provided directly (two parameters).
|
||||
|
||||
Using a target number of buckets, a start of a range, and an end of a range, `BUCKET` picks an appropriate bucket size to generate the target number of buckets or fewer. For example, asking for at most 20 buckets over a year results in monthly buckets:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
|
||||
| SORT hire_date
|
||||
```
|
||||
|
||||
| hire_date:date | month:date |
|
||||
| --- | --- |
|
||||
| [1985-02-18T00:00:00.000Z, 1985-02-24T00:00:00.000Z] | 1985-02-01T00:00:00.000Z |
|
||||
| 1985-05-13T00:00:00.000Z | 1985-05-01T00:00:00.000Z |
|
||||
| 1985-07-09T00:00:00.000Z | 1985-07-01T00:00:00.000Z |
|
||||
| 1985-09-17T00:00:00.000Z | 1985-09-01T00:00:00.000Z |
|
||||
| [1985-10-14T00:00:00.000Z, 1985-10-20T00:00:00.000Z] | 1985-10-01T00:00:00.000Z |
|
||||
| [1985-11-19T00:00:00.000Z, 1985-11-20T00:00:00.000Z, 1985-11-21T00:00:00.000Z] | 1985-11-01T00:00:00.000Z |
|
||||
|
||||
The goal isn’t to provide **exactly** the target number of buckets, it’s to pick a range that people are comfortable with that provides at most the target number of buckets.
|
||||
|
||||
Combine `BUCKET` with an [aggregation](../../esql-functions-operators.md#esql-agg-functions) to create a histogram:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
|
||||
| SORT month
|
||||
```
|
||||
|
||||
| hires_per_month:long | month:date |
|
||||
| --- | --- |
|
||||
| 2 | 1985-02-01T00:00:00.000Z |
|
||||
| 1 | 1985-05-01T00:00:00.000Z |
|
||||
| 1 | 1985-07-01T00:00:00.000Z |
|
||||
| 1 | 1985-09-01T00:00:00.000Z |
|
||||
| 2 | 1985-10-01T00:00:00.000Z |
|
||||
| 4 | 1985-11-01T00:00:00.000Z |
|
||||
|
||||
::::{note}
|
||||
`BUCKET` does not create buckets that don’t match any documents. That’s why this example is missing `1985-03-01` and other dates.
|
||||
::::
|
||||
|
||||
|
||||
Asking for more buckets can result in a smaller range. For example, asking for at most 100 buckets in a year results in weekly buckets:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
|
||||
| SORT week
|
||||
```
|
||||
|
||||
| hires_per_week:long | week:date |
|
||||
| --- | --- |
|
||||
| 2 | 1985-02-18T00:00:00.000Z |
|
||||
| 1 | 1985-05-13T00:00:00.000Z |
|
||||
| 1 | 1985-07-08T00:00:00.000Z |
|
||||
| 1 | 1985-09-16T00:00:00.000Z |
|
||||
| 2 | 1985-10-14T00:00:00.000Z |
|
||||
| 4 | 1985-11-18T00:00:00.000Z |
|
||||
|
||||
::::{note}
|
||||
`BUCKET` does not filter any rows. It only uses the provided range to pick a good bucket size. For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range. Combine`BUCKET` with [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) to filter rows.
|
||||
::::
|
||||
|
||||
|
||||
If the desired bucket size is known in advance, simply provide it as the second argument, leaving the range out:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)
|
||||
| SORT week
|
||||
```
|
||||
|
||||
| hires_per_week:long | week:date |
|
||||
| --- | --- |
|
||||
| 2 | 1985-02-18T00:00:00.000Z |
|
||||
| 1 | 1985-05-13T00:00:00.000Z |
|
||||
| 1 | 1985-07-08T00:00:00.000Z |
|
||||
| 1 | 1985-09-16T00:00:00.000Z |
|
||||
| 2 | 1985-10-14T00:00:00.000Z |
|
||||
| 4 | 1985-11-18T00:00:00.000Z |
|
||||
|
||||
::::{note}
|
||||
When providing the bucket size as the second parameter, it must be a time duration or date period.
|
||||
::::
|
||||
|
||||
|
||||
`BUCKET` can also operate on numeric fields. For example, to create a salary histogram:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)
|
||||
| SORT bs
|
||||
```
|
||||
|
||||
| COUNT(*):long | bs:double |
|
||||
| --- | --- |
|
||||
| 9 | 25000.0 |
|
||||
| 9 | 30000.0 |
|
||||
| 18 | 35000.0 |
|
||||
| 11 | 40000.0 |
|
||||
| 11 | 45000.0 |
|
||||
| 10 | 50000.0 |
|
||||
| 7 | 55000.0 |
|
||||
| 9 | 60000.0 |
|
||||
| 8 | 65000.0 |
|
||||
| 8 | 70000.0 |
|
||||
|
||||
Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range. You have to find the `min` and `max` separately. {{esql}} doesn’t yet have an easy way to do that automatically.
|
||||
|
||||
The range can be omitted if the desired bucket size is known in advance. Simply provide it as the second argument:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)
|
||||
| SORT b
|
||||
```
|
||||
|
||||
| c:long | b:double |
|
||||
| --- | --- |
|
||||
| 1 | 25000.0 |
|
||||
| 1 | 30000.0 |
|
||||
| 1 | 40000.0 |
|
||||
| 2 | 45000.0 |
|
||||
| 2 | 50000.0 |
|
||||
| 1 | 55000.0 |
|
||||
| 1 | 60000.0 |
|
||||
| 1 | 65000.0 |
|
||||
| 1 | 70000.0 |
|
||||
|
||||
Create hourly buckets for the last 24 hours, and calculate the number of events per hour:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()
|
||||
| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())
|
||||
```
|
||||
|
||||
| COUNT(*):long | bucket:date |
|
||||
| --- | --- |
|
||||
|
||||
Create monthly buckets for the year 1985, and calculate the average salary by hiring month
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
|
||||
| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
|
||||
| SORT bucket
|
||||
```
|
||||
|
||||
| AVG(salary):double | bucket:date |
|
||||
| --- | --- |
|
||||
| 46305.0 | 1985-02-01T00:00:00.000Z |
|
||||
| 44817.0 | 1985-05-01T00:00:00.000Z |
|
||||
| 62405.0 | 1985-07-01T00:00:00.000Z |
|
||||
| 49095.0 | 1985-09-01T00:00:00.000Z |
|
||||
| 51532.0 | 1985-10-01T00:00:00.000Z |
|
||||
| 54539.75 | 1985-11-01T00:00:00.000Z |
|
||||
|
||||
`BUCKET` may be used in both the aggregating and grouping part of the [STATS … BY …](/reference/query-languages/esql/esql-commands.md#esql-stats-by) command provided that in the aggregating part the function is referenced by an alias defined in the grouping part, or that it is invoked with the exact same expression:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)
|
||||
| SORT b1, b2
|
||||
| KEEP s1, b1, s2, b2
|
||||
```
|
||||
|
||||
| s1:double | b1:double | s2:double | b2:double |
|
||||
| --- | --- | --- | --- |
|
||||
| 351.0 | 350.0 | 1002.0 | 1000.0 |
|
||||
| 401.0 | 400.0 | 1002.0 | 1000.0 |
|
||||
| 451.0 | 450.0 | 1002.0 | 1000.0 |
|
||||
| 501.0 | 500.0 | 1002.0 | 1000.0 |
|
||||
| 551.0 | 550.0 | 1002.0 | 1000.0 |
|
||||
| 601.0 | 600.0 | 1002.0 | 1000.0 |
|
||||
| 601.0 | 600.0 | 1052.0 | 1050.0 |
|
||||
| 651.0 | 650.0 | 1052.0 | 1050.0 |
|
||||
| 701.0 | 700.0 | 1052.0 | 1050.0 |
|
||||
| 751.0 | 750.0 | 1052.0 | 1050.0 |
|
||||
| 801.0 | 800.0 | 1052.0 | 1050.0 |
|
||||
|
||||
Sometimes you need to change the start value of each bucket by a given duration (similar to date histogram aggregation’s [`offset`](/reference/data-analysis/aggregations/search-aggregations-bucket-histogram-aggregation.md) parameter). To do so, you will need to take into account how the language handles expressions within the `STATS` command: if these contain functions or arithmetic operators, a virtual `EVAL` is inserted before and/or after the `STATS` command. Consequently, a double compensation is needed to adjust the bucketed date value before the aggregation and then again after. For instance, inserting a negative offset of `1 hour` to buckets of `1 year` looks like this:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS dates = MV_SORT(VALUES(birth_date)) BY b = BUCKET(birth_date + 1 HOUR, 1 YEAR) - 1 HOUR
|
||||
| EVAL d_count = MV_COUNT(dates)
|
||||
| SORT d_count, b
|
||||
| LIMIT 3
|
||||
```
|
||||
|
||||
| dates:date | b:date | d_count:integer |
|
||||
| --- | --- | --- |
|
||||
| 1965-01-03T00:00:00.000Z | 1964-12-31T23:00:00.000Z | 1 |
|
||||
| [1955-01-21T00:00:00.000Z, 1955-08-20T00:00:00.000Z, 1955-08-28T00:00:00.000Z, 1955-10-04T00:00:00.000Z] | 1954-12-31T23:00:00.000Z | 4 |
|
||||
| [1957-04-04T00:00:00.000Z, 1957-05-23T00:00:00.000Z, 1957-05-25T00:00:00.000Z, 1957-12-03T00:00:00.000Z] | 1956-12-31T23:00:00.000Z | 4 |
|
||||
|
||||
|
|
@ -1,46 +0,0 @@
|
|||
## `BYTE_LENGTH` [esql-byte_length]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/byte_length.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`string`
|
||||
: String expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the byte length of a string.
|
||||
|
||||
::::{note}
|
||||
All strings are in UTF-8, so a single character can use multiple bytes.
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| string | result |
|
||||
| --- | --- |
|
||||
| keyword | integer |
|
||||
| text | integer |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM airports
|
||||
| WHERE country == "India"
|
||||
| KEEP city
|
||||
| EVAL fn_length = LENGTH(city), fn_byte_length = BYTE_LENGTH(city)
|
||||
```
|
||||
|
||||
| city:keyword | fn_length:integer | fn_byte_length:integer |
|
||||
| --- | --- | --- |
|
||||
| Agwār | 5 | 6 |
|
||||
| Ahmedabad | 9 | 9 |
|
||||
| Bangalore | 9 | 9 |
|
||||
|
||||
|
|
@ -1,113 +0,0 @@
|
|||
## `CASE` [esql-case]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/case.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`condition`
|
||||
: A condition.
|
||||
|
||||
`trueValue`
|
||||
: The value that’s returned when the corresponding condition is the first to evaluate to `true`. The default value is returned when no condition matches.
|
||||
|
||||
`elseValue`
|
||||
: The value that’s returned when no condition evaluates to `true`.
|
||||
|
||||
**Description**
|
||||
|
||||
Accepts pairs of conditions and values. The function returns the value that belongs to the first condition that evaluates to `true`. If the number of arguments is odd, the last argument is the default value which is returned when no condition matches. If the number of arguments is even, and no condition matches, the function returns `null`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| condition | trueValue | elseValue | result |
|
||||
| --- | --- | --- | --- |
|
||||
| boolean | boolean | boolean | boolean |
|
||||
| boolean | boolean | | boolean |
|
||||
| boolean | cartesian_point | cartesian_point | cartesian_point |
|
||||
| boolean | cartesian_point | | cartesian_point |
|
||||
| boolean | cartesian_shape | cartesian_shape | cartesian_shape |
|
||||
| boolean | cartesian_shape | | cartesian_shape |
|
||||
| boolean | date | date | date |
|
||||
| boolean | date | | date |
|
||||
| boolean | date_nanos | date_nanos | date_nanos |
|
||||
| boolean | date_nanos | | date_nanos |
|
||||
| boolean | double | double | double |
|
||||
| boolean | double | | double |
|
||||
| boolean | geo_point | geo_point | geo_point |
|
||||
| boolean | geo_point | | geo_point |
|
||||
| boolean | geo_shape | geo_shape | geo_shape |
|
||||
| boolean | geo_shape | | geo_shape |
|
||||
| boolean | integer | integer | integer |
|
||||
| boolean | integer | | integer |
|
||||
| boolean | ip | ip | ip |
|
||||
| boolean | ip | | ip |
|
||||
| boolean | keyword | keyword | keyword |
|
||||
| boolean | keyword | text | keyword |
|
||||
| boolean | keyword | | keyword |
|
||||
| boolean | long | long | long |
|
||||
| boolean | long | | long |
|
||||
| boolean | text | keyword | keyword |
|
||||
| boolean | text | text | keyword |
|
||||
| boolean | text | | keyword |
|
||||
| boolean | unsigned_long | unsigned_long | unsigned_long |
|
||||
| boolean | unsigned_long | | unsigned_long |
|
||||
| boolean | version | version | version |
|
||||
| boolean | version | | version |
|
||||
|
||||
**Examples**
|
||||
|
||||
Determine whether employees are monolingual, bilingual, or polyglot:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| EVAL type = CASE(
|
||||
languages <= 1, "monolingual",
|
||||
languages <= 2, "bilingual",
|
||||
"polyglot")
|
||||
| KEEP emp_no, languages, type
|
||||
```
|
||||
|
||||
| emp_no:integer | languages:integer | type:keyword |
|
||||
| --- | --- | --- |
|
||||
| 10001 | 2 | bilingual |
|
||||
| 10002 | 5 | polyglot |
|
||||
| 10003 | 4 | polyglot |
|
||||
| 10004 | 5 | polyglot |
|
||||
| 10005 | 1 | monolingual |
|
||||
|
||||
Calculate the total connection success rate based on log messages:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL successful = CASE(
|
||||
STARTS_WITH(message, "Connected to"), 1,
|
||||
message == "Connection error", 0
|
||||
)
|
||||
| STATS success_rate = AVG(successful)
|
||||
```
|
||||
|
||||
| success_rate:double |
|
||||
| --- |
|
||||
| 0.5 |
|
||||
|
||||
Calculate an hourly error rate as a percentage of the total number of log messages:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL error = CASE(message LIKE "*error*", 1, 0)
|
||||
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
|
||||
| STATS error_rate = AVG(error) by hour
|
||||
| SORT hour
|
||||
```
|
||||
|
||||
| error_rate:double | hour:date |
|
||||
| --- | --- |
|
||||
| 0.0 | 2023-10-23T12:00:00.000Z |
|
||||
| 0.6 | 2023-10-23T13:00:00.000Z |
|
||||
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
## `CATEGORIZE` [esql-categorize]
|
||||
|
||||
::::{warning}
|
||||
Do not use on production environments. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
|
||||
::::
|
||||
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/categorize.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Expression to categorize
|
||||
|
||||
**Description**
|
||||
|
||||
Groups text messages into categories of similarly formatted text values.
|
||||
|
||||
`CATEGORIZE` has the following limitations:
|
||||
|
||||
* can’t be used within other expressions
|
||||
* can’t be used with multiple groupings
|
||||
* can’t be used or referenced within aggregate functions
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| keyword | keyword |
|
||||
| text | keyword |
|
||||
|
||||
**Example**
|
||||
|
||||
This example categorizes server logs messages into categories and aggregates their counts.
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| STATS count=COUNT() BY category=CATEGORIZE(message)
|
||||
```
|
||||
|
||||
| count:long | category:keyword |
|
||||
| --- | --- |
|
||||
| 3 | .**?Connected.+?to.**? |
|
||||
| 3 | .**?Connection.+?error.**? |
|
||||
| 1 | .**?Disconnected.**? |
|
|
@ -1,39 +0,0 @@
|
|||
## `CBRT` [esql-cbrt]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/cbrt.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Numeric expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the cube root of a number. The input can be any numeric value, the return value is always a double. Cube roots of infinities are null.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW d = 1000.0
|
||||
| EVAL c = cbrt(d)
|
||||
```
|
||||
|
||||
| d: double | c:double |
|
||||
| --- | --- |
|
||||
| 1000.0 | 10.0 |
|
||||
|
||||
|
|
@ -1,44 +0,0 @@
|
|||
## `CEIL` [esql-ceil]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/ceil.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Numeric expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Round a number up to the nearest integer.
|
||||
|
||||
::::{note}
|
||||
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#ceil(double)).
|
||||
::::
|
||||
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | integer |
|
||||
| long | long |
|
||||
| unsigned_long | unsigned_long |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=1.8
|
||||
| EVAL a=CEIL(a)
|
||||
```
|
||||
|
||||
| a:double |
|
||||
| --- |
|
||||
| 2 |
|
||||
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
## `CIDR_MATCH` [esql-cidr_match]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/cidr_match.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`ip`
|
||||
: IP address of type `ip` (both IPv4 and IPv6 are supported).
|
||||
|
||||
`blockX`
|
||||
: CIDR block to test the IP against.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns true if the provided IP is contained in one of the provided CIDR blocks.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| ip | blockX | result |
|
||||
| --- | --- | --- |
|
||||
| ip | keyword | boolean |
|
||||
| ip | text | boolean |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM hosts
|
||||
| WHERE CIDR_MATCH(ip1, "127.0.0.2/32", "127.0.0.3/32")
|
||||
| KEEP card, host, ip0, ip1
|
||||
```
|
||||
|
||||
| card:keyword | host:keyword | ip0:ip | ip1:ip |
|
||||
| --- | --- | --- | --- |
|
||||
| eth1 | beta | 127.0.0.1 | 127.0.0.2 |
|
||||
| eth0 | gamma | fe80::cae2:65ff:fece:feb9 | 127.0.0.3 |
|
||||
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
## `COALESCE` [esql-coalesce]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/coalesce.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`first`
|
||||
: Expression to evaluate.
|
||||
|
||||
`rest`
|
||||
: Other expression to evaluate.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the first of its arguments that is not null. If all arguments are null, it returns `null`.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| first | rest | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | boolean | boolean |
|
||||
| boolean | | boolean |
|
||||
| cartesian_point | cartesian_point | cartesian_point |
|
||||
| cartesian_shape | cartesian_shape | cartesian_shape |
|
||||
| date | date | date |
|
||||
| date_nanos | date_nanos | date_nanos |
|
||||
| geo_point | geo_point | geo_point |
|
||||
| geo_shape | geo_shape | geo_shape |
|
||||
| integer | integer | integer |
|
||||
| integer | | integer |
|
||||
| ip | ip | ip |
|
||||
| keyword | keyword | keyword |
|
||||
| keyword | | keyword |
|
||||
| long | long | long |
|
||||
| long | | long |
|
||||
| text | text | keyword |
|
||||
| text | | keyword |
|
||||
| version | version | version |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=null, b="b"
|
||||
| EVAL COALESCE(a, b)
|
||||
```
|
||||
|
||||
| a:null | b:keyword | COALESCE(a, b):keyword |
|
||||
| --- | --- | --- |
|
||||
| null | b | b |
|
||||
|
||||
|
|
@ -1,45 +0,0 @@
|
|||
## `CONCAT` [esql-concat]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/concat.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`string1`
|
||||
: Strings to concatenate.
|
||||
|
||||
`string2`
|
||||
: Strings to concatenate.
|
||||
|
||||
**Description**
|
||||
|
||||
Concatenates two or more strings.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| string1 | string2 | result |
|
||||
| --- | --- | --- |
|
||||
| keyword | keyword | keyword |
|
||||
| keyword | text | keyword |
|
||||
| text | keyword | keyword |
|
||||
| text | text | keyword |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name
|
||||
| EVAL fullname = CONCAT(first_name, " ", last_name)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | fullname:keyword |
|
||||
| --- | --- | --- |
|
||||
| Alejandro | McAlpine | Alejandro McAlpine |
|
||||
| Amabile | Gomatam | Amabile Gomatam |
|
||||
| Anneke | Preusig | Anneke Preusig |
|
||||
|
||||
|
|
@ -1,39 +0,0 @@
|
|||
## `COS` [esql-cos]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/cos.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`angle`
|
||||
: An angle, in radians. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the [cosine](https://en.wikipedia.org/wiki/Sine_and_cosine) of an angle.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| angle | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=1.8
|
||||
| EVAL cos=COS(a)
|
||||
```
|
||||
|
||||
| a:double | cos:double |
|
||||
| --- | --- |
|
||||
| 1.8 | -0.2272020946930871 |
|
||||
|
||||
|
|
@ -1,39 +0,0 @@
|
|||
## `COSH` [esql-cosh]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/cosh.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`number`
|
||||
: Numeric expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the [hyperbolic cosine](https://en.wikipedia.org/wiki/Hyperbolic_functions) of a number.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| number | result |
|
||||
| --- | --- |
|
||||
| double | double |
|
||||
| integer | double |
|
||||
| long | double |
|
||||
| unsigned_long | double |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW a=1.8
|
||||
| EVAL cosh=COSH(a)
|
||||
```
|
||||
|
||||
| a:double | cosh:double |
|
||||
| --- | --- |
|
||||
| 1.8 | 3.1074731763172667 |
|
||||
|
||||
|
|
@ -1,98 +0,0 @@
|
|||
## `COUNT` [esql-count]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/count.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Expression that outputs values to be counted. If omitted, equivalent to `COUNT(*)` (the number of rows).
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the total number (count) of input values.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | result |
|
||||
| --- | --- |
|
||||
| boolean | long |
|
||||
| cartesian_point | long |
|
||||
| date | long |
|
||||
| double | long |
|
||||
| geo_point | long |
|
||||
| integer | long |
|
||||
| ip | long |
|
||||
| keyword | long |
|
||||
| long | long |
|
||||
| text | long |
|
||||
| unsigned_long | long |
|
||||
| version | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS COUNT(height)
|
||||
```
|
||||
|
||||
| COUNT(height):long |
|
||||
| --- |
|
||||
| 100 |
|
||||
|
||||
To count the number of rows, use `COUNT()` or `COUNT(*)`
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| STATS count = COUNT(*) BY languages
|
||||
| SORT languages DESC
|
||||
```
|
||||
|
||||
| count:long | languages:integer |
|
||||
| --- | --- |
|
||||
| 10 | null |
|
||||
| 21 | 5 |
|
||||
| 18 | 4 |
|
||||
| 17 | 3 |
|
||||
| 19 | 2 |
|
||||
| 15 | 1 |
|
||||
|
||||
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the values
|
||||
|
||||
```esql
|
||||
ROW words="foo;bar;baz;qux;quux;foo"
|
||||
| STATS word_count = COUNT(SPLIT(words, ";"))
|
||||
```
|
||||
|
||||
| word_count:long |
|
||||
| --- |
|
||||
| 6 |
|
||||
|
||||
To count the number of times an expression returns `TRUE` use a [`WHERE`](/reference/query-languages/esql/esql-commands.md#esql-where) command to remove rows that shouldn’t be included
|
||||
|
||||
```esql
|
||||
ROW n=1
|
||||
| WHERE n < 0
|
||||
| STATS COUNT(n)
|
||||
```
|
||||
|
||||
| COUNT(n):long |
|
||||
| --- |
|
||||
| 0 |
|
||||
|
||||
To count the same stream of data based on two different expressions use the pattern `COUNT(<expression> OR NULL)`. This builds on the three-valued logic ({{wikipedia}}/Three-valued_logic[3VL]) of the language: `TRUE OR NULL` is `TRUE`, but `FALSE OR NULL` is `NULL`, plus the way COUNT handles `NULL`s: `COUNT(TRUE)` and `COUNT(FALSE)` are both 1, but `COUNT(NULL)` is 0.
|
||||
|
||||
```esql
|
||||
ROW n=1
|
||||
| STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL)
|
||||
```
|
||||
|
||||
| COUNT(n > 0 OR NULL):long | COUNT(n < 0 OR NULL):long |
|
||||
| --- | --- |
|
||||
| 1 | 0 |
|
||||
|
||||
|
|
@ -1,123 +0,0 @@
|
|||
## `COUNT_DISTINCT` [esql-count_distinct]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/count_distinct.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`field`
|
||||
: Column or literal for which to count the number of distinct values.
|
||||
|
||||
`precision`
|
||||
: Precision threshold. Refer to [Counts are approximate](../../esql-functions-operators.md#esql-agg-count-distinct-approximate). The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns the approximate number of distinct values.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| field | precision | result |
|
||||
| --- | --- | --- |
|
||||
| boolean | integer | long |
|
||||
| boolean | long | long |
|
||||
| boolean | unsigned_long | long |
|
||||
| boolean | | long |
|
||||
| date | integer | long |
|
||||
| date | long | long |
|
||||
| date | unsigned_long | long |
|
||||
| date | | long |
|
||||
| date_nanos | integer | long |
|
||||
| date_nanos | long | long |
|
||||
| date_nanos | unsigned_long | long |
|
||||
| date_nanos | | long |
|
||||
| double | integer | long |
|
||||
| double | long | long |
|
||||
| double | unsigned_long | long |
|
||||
| double | | long |
|
||||
| integer | integer | long |
|
||||
| integer | long | long |
|
||||
| integer | unsigned_long | long |
|
||||
| integer | | long |
|
||||
| ip | integer | long |
|
||||
| ip | long | long |
|
||||
| ip | unsigned_long | long |
|
||||
| ip | | long |
|
||||
| keyword | integer | long |
|
||||
| keyword | long | long |
|
||||
| keyword | unsigned_long | long |
|
||||
| keyword | | long |
|
||||
| long | integer | long |
|
||||
| long | long | long |
|
||||
| long | unsigned_long | long |
|
||||
| long | | long |
|
||||
| text | integer | long |
|
||||
| text | long | long |
|
||||
| text | unsigned_long | long |
|
||||
| text | | long |
|
||||
| version | integer | long |
|
||||
| version | long | long |
|
||||
| version | unsigned_long | long |
|
||||
| version | | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM hosts
|
||||
| STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1)
|
||||
```
|
||||
|
||||
| COUNT_DISTINCT(ip0):long | COUNT_DISTINCT(ip1):long |
|
||||
| --- | --- |
|
||||
| 7 | 8 |
|
||||
|
||||
With the optional second parameter to configure the precision threshold
|
||||
|
||||
```esql
|
||||
FROM hosts
|
||||
| STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5)
|
||||
```
|
||||
|
||||
| COUNT_DISTINCT(ip0, 80000):long | COUNT_DISTINCT(ip1, 5):long |
|
||||
| --- | --- |
|
||||
| 7 | 9 |
|
||||
|
||||
The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values
|
||||
|
||||
```esql
|
||||
ROW words="foo;bar;baz;qux;quux;foo"
|
||||
| STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";"))
|
||||
```
|
||||
|
||||
| distinct_word_count:long |
|
||||
| --- |
|
||||
| 5 |
|
||||
|
||||
|
||||
### Counts are approximate [esql-agg-count-distinct-approximate]
|
||||
|
||||
Computing exact counts requires loading values into a set and returning its size. This doesn’t scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
|
||||
|
||||
This `COUNT_DISTINCT` function is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
|
||||
|
||||
* configurable precision, which decides on how to trade memory for accuracy,
|
||||
* excellent accuracy on low-cardinality sets,
|
||||
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
|
||||
|
||||
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
|
||||
|
||||
The following chart shows how the error varies before and after the threshold:
|
||||
|
||||

|
||||
|
||||
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed, this is likely to be the case. Accuracy in practice depends on the dataset in question. In general, most datasets show consistently good accuracy. Also note that even with a threshold as low as 100, the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
|
||||
|
||||
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of hashes in a dataset can affect the accuracy of the cardinality.
|
||||
|
||||
The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is `3000`.
|
||||
|
||||
|
|
@ -1,83 +0,0 @@
|
|||
## `DATE_DIFF` [esql-date_diff]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/date_diff.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`unit`
|
||||
: Time difference unit
|
||||
|
||||
`startTimestamp`
|
||||
: A string representing a start timestamp
|
||||
|
||||
`endTimestamp`
|
||||
: A string representing an end timestamp
|
||||
|
||||
**Description**
|
||||
|
||||
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
|
||||
|
||||
| Datetime difference units |
|
||||
| --- |
|
||||
| **unit** | **abbreviations** |
|
||||
| year | years, yy, yyyy |
|
||||
| quarter | quarters, qq, q |
|
||||
| month | months, mm, m |
|
||||
| dayofyear | dy, y |
|
||||
| day | days, dd, d |
|
||||
| week | weeks, wk, ww |
|
||||
| weekday | weekdays, dw |
|
||||
| hour | hours, hh |
|
||||
| minute | minutes, mi, n |
|
||||
| second | seconds, ss, s |
|
||||
| millisecond | milliseconds, ms |
|
||||
| microsecond | microseconds, mcs |
|
||||
| nanosecond | nanoseconds, ns |
|
||||
|
||||
Note that while there is an overlap between the function’s supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| unit | startTimestamp | endTimestamp | result |
|
||||
| --- | --- | --- | --- |
|
||||
| keyword | date | date | integer |
|
||||
| keyword | date | date_nanos | integer |
|
||||
| keyword | date_nanos | date | integer |
|
||||
| keyword | date_nanos | date_nanos | integer |
|
||||
| text | date | date | integer |
|
||||
| text | date | date_nanos | integer |
|
||||
| text | date_nanos | date | integer |
|
||||
| text | date_nanos | date_nanos | integer |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW date1 = TO_DATETIME("2023-12-02T11:00:00.000Z"), date2 = TO_DATETIME("2023-12-02T11:00:00.001Z")
|
||||
| EVAL dd_ms = DATE_DIFF("microseconds", date1, date2)
|
||||
```
|
||||
|
||||
| date1:date | date2:date | dd_ms:integer |
|
||||
| --- | --- | --- |
|
||||
| 2023-12-02T11:00:00.000Z | 2023-12-02T11:00:00.001Z | 1000 |
|
||||
|
||||
When subtracting in calendar units - like year, month a.s.o. - only the fully elapsed units are counted. To avoid this and obtain also remainders, simply switch to the next smaller unit and do the date math accordingly.
|
||||
|
||||
```esql
|
||||
ROW end_23=TO_DATETIME("2023-12-31T23:59:59.999Z"),
|
||||
start_24=TO_DATETIME("2024-01-01T00:00:00.000Z"),
|
||||
end_24=TO_DATETIME("2024-12-31T23:59:59.999")
|
||||
| EVAL end23_to_start24=DATE_DIFF("year", end_23, start_24)
|
||||
| EVAL end23_to_end24=DATE_DIFF("year", end_23, end_24)
|
||||
| EVAL start_to_end_24=DATE_DIFF("year", start_24, end_24)
|
||||
```
|
||||
|
||||
| end_23:date | start_24:date | end_24:date | end23_to_start24:integer | end23_to_end24:integer | start_to_end_24:integer |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| 2023-12-31T23:59:59.999Z | 2024-01-01T00:00:00.000Z | 2024-12-31T23:59:59.999Z | 0 | 1 | 0 |
|
||||
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
## `DATE_EXTRACT` [esql-date_extract]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/date_extract.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`datePart`
|
||||
: Part of the date to extract. Can be: `aligned_day_of_week_in_month`, `aligned_day_of_week_in_year`, `aligned_week_of_month`, `aligned_week_of_year`, `ampm_of_day`, `clock_hour_of_ampm`, `clock_hour_of_day`, `day_of_month`, `day_of_week`, `day_of_year`, `epoch_day`, `era`, `hour_of_ampm`, `hour_of_day`, `instant_seconds`, `micro_of_day`, `micro_of_second`, `milli_of_day`, `milli_of_second`, `minute_of_day`, `minute_of_hour`, `month_of_year`, `nano_of_day`, `nano_of_second`, `offset_seconds`, `proleptic_month`, `second_of_day`, `second_of_minute`, `year`, or `year_of_era`. Refer to [java.time.temporal.ChronoField](https://docs.oracle.com/javase/8/docs/api/java/time/temporal/ChronoField.md) for a description of these values. If `null`, the function returns `null`.
|
||||
|
||||
`date`
|
||||
: Date expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Extracts parts of a date, like year, month, day, hour.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| datePart | date | result |
|
||||
| --- | --- | --- |
|
||||
| keyword | date | long |
|
||||
| keyword | date_nanos | long |
|
||||
| text | date | long |
|
||||
| text | date_nanos | long |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
ROW date = DATE_PARSE("yyyy-MM-dd", "2022-05-06")
|
||||
| EVAL year = DATE_EXTRACT("year", date)
|
||||
```
|
||||
|
||||
| date:date | year:long |
|
||||
| --- | --- |
|
||||
| 2022-05-06T00:00:00.000Z | 2022 |
|
||||
|
||||
Find all events that occurred outside of business hours (before 9 AM or after 5PM), on any given date:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| WHERE DATE_EXTRACT("hour_of_day", @timestamp) < 9 AND DATE_EXTRACT("hour_of_day", @timestamp) >= 17
|
||||
```
|
||||
|
||||
| @timestamp:date | client_ip:ip | event_duration:long | message:keyword |
|
||||
| --- | --- | --- | --- |
|
||||
|
||||
|
|
@ -1,47 +0,0 @@
|
|||
## `DATE_FORMAT` [esql-date_format]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/date_format.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`dateFormat`
|
||||
: Date format (optional). If no format is specified, the `yyyy-MM-dd'T'HH:mm:ss.SSSZ` format is used. If `null`, the function returns `null`.
|
||||
|
||||
`date`
|
||||
: Date expression. If `null`, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns a string representation of a date, in the provided format.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| dateFormat | date | result |
|
||||
| --- | --- | --- |
|
||||
| date | | keyword |
|
||||
| date_nanos | | keyword |
|
||||
| keyword | date | keyword |
|
||||
| keyword | date_nanos | keyword |
|
||||
| text | date | keyword |
|
||||
| text | date_nanos | keyword |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name, hire_date
|
||||
| EVAL hired = DATE_FORMAT("yyyy-MM-dd", hire_date)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | hire_date:date | hired:keyword |
|
||||
| --- | --- | --- | --- |
|
||||
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-06-26 |
|
||||
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-11-18 |
|
||||
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-06-02 |
|
||||
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
## `DATE_PARSE` [esql-date_parse]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/date_parse.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`datePattern`
|
||||
: The date format. Refer to the [`DateTimeFormatter` documentation](https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/time/format/DateTimeFormatter.md) for the syntax. If `null`, the function returns `null`.
|
||||
|
||||
`dateString`
|
||||
: Date expression as a string. If `null` or an empty string, the function returns `null`.
|
||||
|
||||
**Description**
|
||||
|
||||
Returns a date by parsing the second argument using the format specified in the first argument.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| datePattern | dateString | result |
|
||||
| --- | --- | --- |
|
||||
| keyword | keyword | date |
|
||||
| keyword | text | date |
|
||||
| text | keyword | date |
|
||||
| text | text | date |
|
||||
|
||||
**Example**
|
||||
|
||||
```esql
|
||||
ROW date_string = "2022-05-06"
|
||||
| EVAL date = DATE_PARSE("yyyy-MM-dd", date_string)
|
||||
```
|
||||
|
||||
| date_string:keyword | date:date |
|
||||
| --- | --- |
|
||||
| 2022-05-06 | 2022-05-06T00:00:00.000Z |
|
||||
|
||||
|
|
@ -1,86 +0,0 @@
|
|||
## `DATE_TRUNC` [esql-date_trunc]
|
||||
|
||||
**Syntax**
|
||||
|
||||
:::{image} ../../../../../images/date_trunc.svg
|
||||
:alt: Embedded
|
||||
:class: text-center
|
||||
:::
|
||||
|
||||
**Parameters**
|
||||
|
||||
`interval`
|
||||
: Interval; expressed using the timespan literal syntax.
|
||||
|
||||
`date`
|
||||
: Date expression
|
||||
|
||||
**Description**
|
||||
|
||||
Rounds down a date to the closest interval.
|
||||
|
||||
**Supported types**
|
||||
|
||||
| interval | date | result |
|
||||
| --- | --- | --- |
|
||||
| date_period | date | date |
|
||||
| date_period | date_nanos | date_nanos |
|
||||
| time_duration | date | date |
|
||||
| time_duration | date_nanos | date_nanos |
|
||||
|
||||
**Examples**
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| KEEP first_name, last_name, hire_date
|
||||
| EVAL year_hired = DATE_TRUNC(1 year, hire_date)
|
||||
```
|
||||
|
||||
| first_name:keyword | last_name:keyword | hire_date:date | year_hired:date |
|
||||
| --- | --- | --- | --- |
|
||||
| Alejandro | McAlpine | 1991-06-26T00:00:00.000Z | 1991-01-01T00:00:00.000Z |
|
||||
| Amabile | Gomatam | 1992-11-18T00:00:00.000Z | 1992-01-01T00:00:00.000Z |
|
||||
| Anneke | Preusig | 1989-06-02T00:00:00.000Z | 1989-01-01T00:00:00.000Z |
|
||||
|
||||
Combine `DATE_TRUNC` with [`STATS`](/reference/query-languages/esql/esql-commands.md#esql-stats-by) to create date histograms. For example, the number of hires per year:
|
||||
|
||||
```esql
|
||||
FROM employees
|
||||
| EVAL year = DATE_TRUNC(1 year, hire_date)
|
||||
| STATS hires = COUNT(emp_no) BY year
|
||||
| SORT year
|
||||
```
|
||||
|
||||
| hires:long | year:date |
|
||||
| --- | --- |
|
||||
| 11 | 1985-01-01T00:00:00.000Z |
|
||||
| 11 | 1986-01-01T00:00:00.000Z |
|
||||
| 15 | 1987-01-01T00:00:00.000Z |
|
||||
| 9 | 1988-01-01T00:00:00.000Z |
|
||||
| 13 | 1989-01-01T00:00:00.000Z |
|
||||
| 12 | 1990-01-01T00:00:00.000Z |
|
||||
| 6 | 1991-01-01T00:00:00.000Z |
|
||||
| 8 | 1992-01-01T00:00:00.000Z |
|
||||
| 3 | 1993-01-01T00:00:00.000Z |
|
||||
| 4 | 1994-01-01T00:00:00.000Z |
|
||||
| 5 | 1995-01-01T00:00:00.000Z |
|
||||
| 1 | 1996-01-01T00:00:00.000Z |
|
||||
| 1 | 1997-01-01T00:00:00.000Z |
|
||||
| 1 | 1999-01-01T00:00:00.000Z |
|
||||
|
||||
Or an hourly error rate:
|
||||
|
||||
```esql
|
||||
FROM sample_data
|
||||
| EVAL error = CASE(message LIKE "*error*", 1, 0)
|
||||
| EVAL hour = DATE_TRUNC(1 hour, @timestamp)
|
||||
| STATS error_rate = AVG(error) by hour
|
||||
| SORT hour
|
||||
```
|
||||
|
||||
| error_rate:double | hour:date |
|
||||
| --- | --- |
|
||||
| 0.0 | 2023-10-23T12:00:00.000Z |
|
||||
| 0.6 | 2023-10-23T13:00:00.000Z |
|
||||
|
||||
|
|
@ -5,7 +5,7 @@
|
|||
Round a number up to the nearest integer.
|
||||
|
||||
::::{note}
|
||||
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#ceil(double)).
|
||||
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.ceil](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.html#ceil(double)).
|
||||
::::
|
||||
|
||||
|
||||
|
|
|
@ -4,9 +4,10 @@
|
|||
|
||||
Subtracts the `startTimestamp` from the `endTimestamp` and returns the difference in multiples of `unit`. If `startTimestamp` is later than the `endTimestamp`, negative values are returned.
|
||||
|
||||
| Datetime difference units |
|
||||
| --- |
|
||||
| **unit** | **abbreviations** |
|
||||
**Datetime difference units**
|
||||
|
||||
| unit | abbreviations |
|
||||
| --- | --- |
|
||||
| year | years, yy, yyyy |
|
||||
| quarter | quarters, qq, q |
|
||||
| month | months, mm, m |
|
||||
|
@ -21,5 +22,9 @@ Subtracts the `startTimestamp` from the `endTimestamp` and returns the differenc
|
|||
| microsecond | microseconds, mcs |
|
||||
| nanosecond | nanoseconds, ns |
|
||||
|
||||
Note that while there is an overlap between the function’s supported units and {{esql}}'s supported time span literals, these sets are distinct and not interchangeable. Similarly, the supported abbreviations are conveniently shared with implementations of this function in other established products and not necessarily common with the date-time nomenclature used by {{es}}.
|
||||
Note that while there is an overlap between the function’s supported units and
|
||||
{{esql}}’s supported time span literals, these sets are distinct and not
|
||||
interchangeable. Similarly, the supported abbreviations are conveniently shared
|
||||
with implementations of this function in other established products and not
|
||||
necessarily common with the date-time nomenclature used by {{es}}.
|
||||
|
||||
|
|
|
@ -5,7 +5,9 @@
|
|||
Round a number down to the nearest integer.
|
||||
|
||||
::::{note}
|
||||
This is a noop for `long` (including unsigned) and `integer`. For `double` this picks the closest `double` value to the integer similar to [Math.floor](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.md#floor(double)).
|
||||
This is a noop for `long` (including unsigned) and `integer`.
|
||||
For `double` this picks the closest `double` value to the integer
|
||||
similar to [Math.floor](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Math.html#floor(double)).
|
||||
::::
|
||||
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](../../../esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once.
|
||||
Returns the maximum value from multiple columns. This is similar to [`MV_MAX`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_max) except it is intended to run on multiple columns at once.
|
||||
|
||||
::::{note}
|
||||
When run on `keyword` or `text` fields, this returns the last string in alphabetical order. When run on `boolean` columns this will return `true` if any values are `true`.
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](../../../esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once.
|
||||
Returns the minimum value from multiple columns. This is similar to [`MV_MIN`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_min) except it is intended to run on multiple columns at once.
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Use `MATCH` to perform a [match query](/reference/query-languages/query-dsl-match-query.md) on the specified field. Using `MATCH` is equivalent to using the `match` query in the Elasticsearch Query DSL. Match can be used on fields from the text family like [text](/reference/elasticsearch/mapping-reference/text.md) and [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md), as well as other field types like keyword, boolean, dates, and numeric types. Match can use [function named parameters](/reference/query-languages/esql/esql-syntax.md#esql-function-named-params) to specify additional options for the match query. All [match query parameters](/reference/query-languages/query-dsl-match-query.md#match-field-params) are supported. For a simplified syntax, you can use the [match operator](../../../esql-functions-operators.md#esql-search-operators) `:` operator instead of `MATCH`. `MATCH` returns true if the provided query matches the row.
|
||||
Use `MATCH` to perform a [match query](/reference/query-languages/query-dsl-match-query.md) on the specified field. Using `MATCH` is equivalent to using the `match` query in the Elasticsearch Query DSL. Match can be used on fields from the text family like [text](/reference/elasticsearch/mapping-reference/text.md) and [semantic_text](/reference/elasticsearch/mapping-reference/semantic-text.md), as well as other field types like keyword, boolean, dates, and numeric types. Match can use [function named parameters](/reference/query-languages/esql/esql-syntax.md#esql-function-named-params) to specify additional options for the match query. All [match query parameters](/reference/query-languages/query-dsl-match-query.md#match-field-params) are supported. For a simplified syntax, you can use the [match operator](/reference/query-languages/esql/esql-functions-operators.md#esql-search-operators) `:` operator instead of `MATCH`. `MATCH` returns true if the provided query matches the row.
|
||||
|
||||
|
|
|
@ -2,10 +2,10 @@
|
|||
|
||||
**Description**
|
||||
|
||||
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile).
|
||||
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile).
|
||||
|
||||
::::{note}
|
||||
Like [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](../../../esql-functions-operators.md#esql-percentile-approximate).
|
||||
Like [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile), `MEDIAN` is [usually approximate](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile-approximate).
|
||||
::::
|
||||
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation. It is calculated as the median of each data point’s deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`.
|
||||
|
||||
::::{note}
|
||||
Like [`PERCENTILE`](../../../esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](../../../esql-functions-operators.md#esql-percentile-approximate).
|
||||
Like [`PERCENTILE`](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](/reference/query-languages/esql/esql-functions-operators.md#esql-percentile-approximate).
|
||||
::::
|
||||
|
||||
|
||||
|
|
|
@ -2,7 +2,11 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts a multivalued expression into a single valued column containing the first value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split).
|
||||
Converts a multivalued expression into a single valued column containing the first value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split).
|
||||
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but don’t rely on that. If you need the minimum value use [`MV_MIN`](../../../esql-functions-operators.md#esql-mv_min) instead of `MV_FIRST`. `MV_MIN` has optimizations for sorted values so there isn’t a performance benefit to `MV_FIRST`.
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
|
||||
underlying storage is not guaranteed. It is **frequently** ascending, but don’t
|
||||
rely on that. If you need the minimum value use [`MV_MIN`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_min) instead of
|
||||
`MV_FIRST`. `MV_MIN` has optimizations for sorted values so there isn’t a
|
||||
performance benefit to `MV_FIRST`.
|
||||
|
||||
|
|
|
@ -2,7 +2,11 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts a multivalue expression into a single valued column containing the last value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split).
|
||||
Converts a multivalue expression into a single valued column containing the last value. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split).
|
||||
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but don’t rely on that. If you need the maximum value use [`MV_MAX`](../../../esql-functions-operators.md#esql-mv_max) instead of `MV_LAST`. `MV_MAX` has optimizations for sorted values so there isn’t a performance benefit to `MV_LAST`.
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
|
||||
underlying storage is not guaranteed. It is **frequently** ascending, but don’t
|
||||
rely on that. If you need the maximum value use [`MV_MAX`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_max) instead of
|
||||
`MV_LAST`. `MV_MAX` has optimizations for sorted values so there isn’t a
|
||||
performance benefit to `MV_LAST`.
|
||||
|
||||
|
|
|
@ -2,7 +2,9 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns a subset of the multivalued field using the start and end index values. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](../../../esql-functions-operators.md#esql-split) or [`MV_SORT`](../../../esql-functions-operators.md#esql-mv_sort).
|
||||
Returns a subset of the multivalued field using the start and end index values. This is most useful when reading from a function that emits multivalued columns in a known order like [`SPLIT`](/reference/query-languages/esql/esql-functions-operators.md#esql-split) or [`MV_SORT`](/reference/query-languages/esql/esql-functions-operators.md#esql-mv_sort).
|
||||
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from underlying storage is not guaranteed. It is **frequently** ascending, but don’t rely on that.
|
||||
The order that [multivalued fields](/reference/query-languages/esql/esql-multivalued-fields.md) are read from
|
||||
underlying storage is not guaranteed. It is **frequently** ascending, but don’t
|
||||
rely on that.
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns whether the first geometry contains the second geometry. This is the inverse of the [ST_WITHIN](../../../esql-functions-operators.md#esql-st_within) function.
|
||||
Returns whether the first geometry contains the second geometry. This is the inverse of the [ST_WITHIN](/reference/query-languages/esql/esql-functions-operators.md#esql-st_within) function.
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns whether the two geometries or geometry columns are disjoint. This is the inverse of the [ST_INTERSECTS](../../../esql-functions-operators.md#esql-st_intersects) function. In mathematical terms: ST_Disjoint(A, B) ⇔ A ⋂ B = ∅
|
||||
Returns whether the two geometries or geometry columns are disjoint. This is the inverse of the [ST_INTERSECTS](/reference/query-languages/esql/esql-functions-operators.md#esql-st_intersects) function. In mathematical terms: ST_Disjoint(A, B) ⇔ A ⋂ B = ∅
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns true if two geometries intersect. They intersect if they have any point in common, including their interior points (points along lines or within polygons). This is the inverse of the [ST_DISJOINT](../../../esql-functions-operators.md#esql-st_disjoint) function. In mathematical terms: ST_Intersects(A, B) ⇔ A ⋂ B ≠ ∅
|
||||
Returns true if two geometries intersect. They intersect if they have any point in common, including their interior points (points along lines or within polygons). This is the inverse of the [ST_DISJOINT](/reference/query-languages/esql/esql-functions-operators.md#esql-st_disjoint) function. In mathematical terms: ST_Intersects(A, B) ⇔ A ⋂ B ≠ ∅
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Returns whether the first geometry is within the second geometry. This is the inverse of the [ST_CONTAINS](../../../esql-functions-operators.md#esql-st_contains) function.
|
||||
Returns whether the first geometry is within the second geometry. This is the inverse of the [ST_CONTAINS](/reference/query-languages/esql/esql-functions-operators.md#esql-st_contains) function.
|
||||
|
||||
|
|
|
@ -0,0 +1,6 @@
|
|||
% This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
|
||||
|
||||
**Description**
|
||||
|
||||
Performs a Term query on the specified field. Returns true if the provided term matches the row.
|
||||
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts an input value to a boolean value. A string value of **true** will be case-insensitive converted to the Boolean **true**. For anything else, including the empty string, the function will return **false**. The numerical value of **0** will be converted to **false**, anything else will be converted to **true**.
|
||||
Converts an input value to a boolean value. A string value of `true` will be case-insensitive converted to the Boolean `true`. For anything else, including the empty string, the function will return `false`. The numerical value of `0` will be converted to `false`, anything else will be converted to `true`.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts an input value to a date value. A string will only be successfully converted if it’s respecting the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use [`DATE_PARSE`](../../../esql-functions-operators.md#esql-date_parse).
|
||||
Converts an input value to a date value. A string will only be successfully converted if it’s respecting the format `yyyy-MM-dd'T'HH:mm:ss.SSS'Z'`. To convert dates in other formats, use [`DATE_PARSE`](/reference/query-languages/esql/esql-functions-operators.md#esql-date_parse).
|
||||
|
||||
::::{note}
|
||||
Note that when converting from nanosecond resolution to millisecond resolution with this function, the nanosecond date is truncated, not rounded.
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts an input value to a double value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to double. Boolean **true** will be converted to double **1.0**, **false** to **0.0**.
|
||||
Converts an input value to a double value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to double. Boolean `true` will be converted to double `1.0`, `false` to `0.0`.
|
||||
|
||||
|
|
|
@ -2,5 +2,5 @@
|
|||
|
||||
**Description**
|
||||
|
||||
Converts an input value to an integer value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to integer. Boolean **true** will be converted to integer **1**, **false** to **0**.
|
||||
Converts an input value to an integer value. If the input parameter is of a date type, its value will be interpreted as milliseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time), converted to integer. Boolean `true` will be converted to integer `1`, `false` to `0`.
|
||||
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue