elasticsearch/docs/reference/esql/functions/count-distinct.asciidoc

[discrete]
[[esql-agg-count-distinct]]
=== `COUNT_DISTINCT`
The approximate number of distinct values.

[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
|===

Can take any field type as input and the result is always a `long` not matter
the input type.

[discrete]
==== Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.

This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:

include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]

[discrete]
==== Precision is configurable

The `COUNT_DISTINCT` function takes an optional second parameter to configure the
precision discussed previously.

[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
|===
Restructure ES\|QL docs (#100806) * Break out 'Limitations' into separate page * Add REST API docs * Restructure commands, functions, and operators refs * Add placeholder for getting started guide * Group 'Syntax', 'Metafields', and 'MV fields' under 'Language' * Add placeholder for Kibana page * Add link from landing page * Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs * Reword default LIMIT * Add support for COUNT() Move 'Commands' and 'Functions and operators' to individual pages --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> 2023-10-17 23:36:14 +08:00			`[discrete]`
Docs for aggregation functions (ESQL-1268) This adds docs for all of ESQL's aggregation functions. Hopefully from here on out we can add the docs as we add new functions. I've created a few tagged regions in the aggs docs themselves so we can include them into the ESQL docs. --------- Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> 2023-06-14 22:23:34 +08:00			`[[esql-agg-count-distinct]]`
			=== `COUNT_DISTINCT`
			`The approximate number of distinct values.`

			`[source.merge.styled,esql]`
			`----`
			`include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]`
			`----`
			`[%header.monospaced.styled,format=dsv,separator=\|]`
			`\|===`
			`include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]`
			`\|===`

			Can take any field type as input and the result is always a `long` not matter
			`the input type.`

Restructure ES\|QL docs (#100806) * Break out 'Limitations' into separate page * Add REST API docs * Restructure commands, functions, and operators refs * Add placeholder for getting started guide * Group 'Syntax', 'Metafields', and 'MV fields' under 'Language' * Add placeholder for Kibana page * Add link from landing page * Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs * Reword default LIMIT * Add support for COUNT() Move 'Commands' and 'Functions and operators' to individual pages --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> 2023-10-17 23:36:14 +08:00			`[discrete]`
Docs for aggregation functions (ESQL-1268) This adds docs for all of ESQL's aggregation functions. Hopefully from here on out we can add the docs as we add new functions. I've created a few tagged regions in the aggs docs themselves so we can include them into the ESQL docs. --------- Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> 2023-06-14 22:23:34 +08:00			`==== Counts are approximate`

			`Computing exact counts requires loading values into a set and returning its`
			`size. This doesn't scale when working on high-cardinality sets and/or large`
			`values as the required memory usage and the need to communicate those`
			`per-shard sets between nodes would utilize too many resources of the cluster.`

			This `COUNT_DISTINCT` function is based on the
			`https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]`
			`algorithm, which counts based on the hashes of the values with some interesting`
			`properties:`

			`include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]`

Restructure ES\|QL docs (#100806) * Break out 'Limitations' into separate page * Add REST API docs * Restructure commands, functions, and operators refs * Add placeholder for getting started guide * Group 'Syntax', 'Metafields', and 'MV fields' under 'Language' * Add placeholder for Kibana page * Add link from landing page * Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs * Reword default LIMIT * Add support for COUNT() Move 'Commands' and 'Functions and operators' to individual pages --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> 2023-10-17 23:36:14 +08:00			`[discrete]`
Docs for aggregation functions (ESQL-1268) This adds docs for all of ESQL's aggregation functions. Hopefully from here on out we can add the docs as we add new functions. I've created a few tagged regions in the aggs docs themselves so we can include them into the ESQL docs. --------- Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co> 2023-06-14 22:23:34 +08:00			`==== Precision is configurable`

			The `COUNT_DISTINCT` function takes an optional second parameter to configure the
			`precision discussed previously.`

			`[source.merge.styled,esql]`
			`----`
			`include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]`
			`----`
			`[%header.monospaced.styled,format=dsv,separator=\|]`
			`\|===`
			`include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]`
			`\|===`