Commit Graph

3 Commits

Author SHA1 Message Date
Nik Everett 46f95a67b4
ESQL: More MV_* tests (#100564)
This adds more tests for some of the `MV_` functions and updates their
docs now that the railroad diagram and table generated by the tests
covers all of the types.
2023-10-24 16:55:17 -04:00
Abdon Pijpelink 8ac4ba751e
Restructure ES|QL docs (#100806)
* Break out 'Limitations' into separate page

* Add REST API docs

* Restructure commands, functions, and operators refs

* Add placeholder for getting started guide

* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'

* Add placeholder for Kibana page

* Add link from landing page

* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs

* Reword default LIMIT

* Add support for COUNT(*)

* Move 'Commands' and 'Functions and operators' to individual pages

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-10-17 17:36:14 +02:00
Nik Everett 1a1941913d Implement `MV_DEDUPE` (ESQL-1287)
This implements the `MV_DEDUPE` function that removes duplicates from
multivalues fields. It wasn't strictly in our list of things we need in
the first release, but I'm grabbing this now because I realized I needed
very similar infrastructure when I was trying to build grouping by
multivalued fields. In fact, I realized that I could use our
stringtemplate code generation to generate most of the complex parts.
This generates the actual body of `MV_DEDUPE`'s implementation and the
body of the `Block` accepting `BlockHash` implementations. It'll be
useful in the final step for grouping by multivalued fields.

I also got pretty curious about whether the `O(n^2)` or `O(n*log(n))`
algorithm for deduplication is faster. I'd been assuming that for all
reasonable sized inputs the `O(n^2)` bubble sort looking selection
algorithm was faster. So I measured it. And it's mostly true - even for
`BytesRef` if you have a dozen entries the selection algorithm is
faster. Lower overhead and stuff. Anyway, to measure it I had to
implement the copy-and-sort `O(n*log(n))` algorithm. So while I was
there I plugged it in and selected it in cases where the number of
inputs is large and the selection alogorithm is likely to be slower.
2023-06-27 08:13:19 -05:00