Commit Graph

63 Commits

Author SHA1 Message Date
Nik Everett 46f95a67b4
ESQL: More MV_* tests (#100564)
This adds more tests for some of the `MV_` functions and updates their
docs now that the railroad diagram and table generated by the tests
covers all of the types.
2023-10-24 16:55:17 -04:00
AlexB 931dcae41d
Add improvements to the ES|QL docs (#101195)
Content and structural improvements to the ES|QL docs

---------

Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-10-23 07:45:42 -07:00
Abdon Pijpelink 8ac4ba751e
Restructure ES|QL docs (#100806)
* Break out 'Limitations' into separate page

* Add REST API docs

* Restructure commands, functions, and operators refs

* Add placeholder for getting started guide

* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'

* Add placeholder for Kibana page

* Add link from landing page

* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs

* Reword default LIMIT

* Add support for COUNT(*)

* Move 'Commands' and 'Functions and operators' to individual pages

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-10-17 17:36:14 +02:00
gheorghepucea cb30096c65
Referenced the svgs of starts_with and trim in asciidoc for consistency. (#100834) 2023-10-13 16:01:47 +02:00
Nik Everett 38eac268b4
ESQL: Build tracked block in EVAL (#100268)
This changes `EVAL` to build tracked blocks so we can trip the breaker
when there are too many tracked blocks hanging about.
2023-10-04 10:29:54 -04:00
Nik Everett 9620512a89
ESQL: Tests for large concat and many evals (#100159) 2023-10-03 14:41:40 -04:00
Luigi Dell'Aquila 6e79013088
ESQL: enhance SHOW FUNCTIONS command (#99736)
Fixes https://github.com/elastic/elasticsearch/issues/99507

Enhance SHOW FUNCTIONS command to return as _structured_ information as
possible about the function signature, ie. - function name - return type
- param names - param types - param descriptions

For now, as an example, the annotations are used only on `sin()` and
`date_parse()` functions; **if we agree on this approach**, I'll proceed
to - enhance all the currently implemented functions with the needed
information - improve the function tests to verify that every new
implemented function provides meaningful information

---

This feature can be useful for the end user, but the main goal is to
give Kibana an easy way to produce in-line documentation (contextual
messages, autocomplete) for functions

Similar to current implementation, that has a `@Named("paramName")`
annotation for function parameters, this PR introduces two more
annotations `@Param(name, type, description, optional)` and
`@FunctionInfo()` to provide information about single parameters and
functions.

The result of `SHOW FUNCTIONS` query will have the following columns: -
name (keyword): the function name - synopsis (keyword): the full
signature of the funciton, eg. `double sin(n:integer|long|double)` -
argNames (keyword MV): the function argument names - argTypes (keyword
MV): the function argument types - argDescriptions (keyword MD): a
textual description of each function argument - returnType (keyword):
the return type of the function - description (keyword): a textual
description of the function

---

Open questions: - ~~how structured shoud *types* be? Eg. should we have
a strict `@Typed("keyword")`/`@Typed({"keyword", "text"})` or should we
have a more generic type description, eg. `@Typed("numeric")`,
`@Typed("any")`? The first one is more useful for API consumption but
it's hard with our complex type system (type classes, custom types,
unsupported and so on); the second one is less structured, but probably
more useful for documentation, that is the most immediate use case of
this feature.~~ All the types are listed explicitly

- ~~we have alternatives for the synopsis, eg.~~
  - ~~`functionName(<paramName>:<paramType>, ...): <returnType>`~
  - ~~`<returnType> functionName(<paramName>:<paramType>, ...)`~~
  - ~~`<returnType> functionName(<paramType> <paramName>, ...)`~~
  Using `<returnType> functionName(<paramName>:<paramType>, ...)` for now. If multiple types are supported, then they will be separated by pipes, eg. `double sin(n:integer|long|double)`.
2023-10-02 13:56:41 -04:00
AlexB 2ccdae6745
Eval REPLACE function (#98909)
Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com>
Co-authored-by: Andrei Stefan <andrei@elastic.co>
2023-09-29 17:41:20 +03:00
Nik Everett e1b1f6f1db
ESQL: Create `Block.Ref` (#100042)
This creates `Block.Ref`, a reference to a `Block` which may or may not
be part of a `Page`. `Block.Ref` is `Releasable` and closing it is a
noop if the `Block` is part of a `Page`, but if it is "free floating"
then closing the `Block.Ref` will close the block.

It also modified `ExpressionEvaluator` to return a `Block.Ref` instead
of a `Block` - so you tend to work with `ExpressionEvaluator`s like
this:

```
try (Block.Ref ref = eval.eval(page)) {
  return ref.block().doStuff();
}
```

This should make it *much* easier to release the memory from `Block`s
built by `ExpressionEvaluator`s.

This change is mostly mechanical, introducing the new signature for
`ExpressionEvaluator`. In a follow up change I'll modify the tests to
make sure we're correctly using it to close pages.

I did think about changing `ExpressionEvaluator` to add a method telling
you if the block that it returns must be closed or not. This would have
been more difficult to work with, and, ultimately, limiting.
Specifically, it is possible for an `ExpressionEvaluator` to *sometimes*
return a free floating block and other times return one that is
contained in a `Page`. Imagine `mv_concat` - it returns the block it
receives if the block doesn't have multivalued fields. Otherwise it
concats things. If that block happens to come directly out of the
`Page`, then `mv_concat` will sometimes produce free floating blocks and
sometimes not.
2023-09-29 09:26:44 -04:00
Nik Everett 5e3ab06151
ESQL: Prevent `CONCAT` from using a ton of memory (#99716)
This prevents `CONCAT` from using an unbounded amount of memory by
hooking it's temporary value into the circuit breaker. To do so, it
makes *all* `ExpressionEvaluator`s `Releasable`. Most of the changes in
this PR just plumb that through to every evaluator. The rest of the
changes correctly release evaluators after their use.

I considered another tactic but didn't like it as much, even though the
number of changes would be smaller - I could have created a fresh,
`Releasable` temporary value for every `Page`. It would be pretty
contained keep the releasable there. But I wanted to share the temporary
state across runs to avoid a bunch of allocations.

Here's a script that used to crash before this PR but is fine after:
```
curl -uelastic:password -XDELETE localhost:9200/test
curl -HContent-Type:application/json -uelastic:password -XPUT localhost:9200/test -d'{
   "mappings": {
      "properties": {
         "short": {
            "type": "keyword"
         }
      }
   }
}'
curl -HContent-Type:application/json -uelastic:password -XPUT localhost:9200/test/_doc/1?refresh -d'{"short": "short"}'

echo -n '{"query": "FROM test ' > /tmp/evil
for i in {0..9}; do
   echo -n '| EVAL short = CONCAT(short' >> /tmp/evil
   for j in {1..9}; do
      echo -n ', short' >> /tmp/evil
   done
   echo -n ')' >> /tmp/evil
done
echo '| EVAL len = LENGTH(short) | KEEP len"}'>> /tmp/evil
curl -HContent-Type:application/json -uelastic:password -XPOST localhost:9200/_query?pretty --data-binary @/tmp/evil
```
2023-09-22 11:27:13 -04:00
Bogdan Pintea 34eea49ef5
ESQL: Swap arguments of remaining date_xxx() functions (#99561)
This swaps the argument of `date_extract()`, `date_format()` and
`date_parse()` functions, to align with `date_trunc()`. The field
argument is now always last, even for _format() and _parse(), whose
optional argument will now be provided as the first one.
2023-09-19 20:22:34 +02:00
gheorghepucea d58b9ea87d
Added esql ends_with implementation (#99613)
Added an implementation for `ends_with` function in esql.  `ends_with` -
Returns a boolean that indicates whether a keyword string ends with
another string. Also made sure that the docs look alright: 

<img width="1677" alt="Screenshot 2023-09-16 at 18 10 46"
src="https://github.com/elastic/elasticsearch/assets/91881042/eccd81e1-40a2-4a66-a514-cf3e4205f9da">
2023-09-18 11:29:20 -04:00
Nik Everett 0d8a1975a9
ESQL: Fix test for unsigned long (#99441)
We were generating negative values which made the tests confused.
2023-09-12 11:46:09 -04:00
Nik Everett 44c3cde48c ESQL: Fix compile
Two PRs cross in the night. Then nothing compiles.
2023-09-11 14:35:05 -04:00
Nik Everett 936e69ddd5
ESQL: Yet more function tests and docs (#99009)
This adds tests, supported types, and a signature image for `to_string`
and `to_version`. It also fixes the resolution of functions who's names
contain an `_`

Finally, it updates the docs for `ceil` to render the image more nicely.
2023-09-11 14:10:17 -04:00
Abdon Pijpelink 91759ce592
[DOCS] Some minor ES|QL docs fixes (#99423) 2023-09-11 16:20:10 +02:00
dreamquster 04381664c1
ESQL: Implement 'right' function (#98974)
Add the 'right' function, which extracts a substring beginning from its
right end (opposite function of 'left').
---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
2023-09-08 17:27:59 +02:00
Nik Everett b73cc0c529
ESQL: Only generate syntax diagrams locally (#99059)
CI will skip building them. Lot's of CI machines don't have font support
so they can't generate these. But all local machine have a GUI so they
can.

Also, super-lazy initialize the font so CI don't bump into it by
accident.

Closes #99018
2023-08-30 14:44:14 -04:00
dreamquster 2644ccbb8a
Implement the 'left' function in issue #98545 (#98942)
@nik9000  Recheck out the main branch. Refactor the 'left' function to
cut the prefix string in place. But I meet a adversity that left failed
the test case 'testEvaluateInManyThreads'. I find that in multiple
thread situation,  `  EvalOperator.ExpressionEvaluator eval =
evalSupplier.get(); for (int c = 0; c < count; c++) {      
assertThat(toJavaObject(eval.eval(page), 0), testCase.getMatcher()); } `
toJavaObject function return a BytesRef with length=2, content is
[81,89]. However, assertThat function in junit4 receive the BytesRef
parameters that its length is 10. Can you give me some clues? I can't
find which variable is mutual.

Rerun failed test case's command: `gradlew ':x-pack:plugin:esql:test'
--tests
"org.elasticsearch.xpack.esql.expression.function.scalar.string.LeftTests.testEvaluateInManyThreads
{TestCase=Left basic test}" -Dtests.seed=44459C172243712
-Dtests.locale=lv-LV -Dtests.timezone=Asia/Irkutsk -Druntime.java=20`
2023-08-28 13:17:16 -04:00
Nik Everett 4cd7f40712
ESQL: More docs (#98890)
Adds some more typed docs to the ESQL functions.
2023-08-28 11:17:04 -04:00
Alexander Spies ca3dc3a882
ESQL: Add `CEIL` function (#98847)
Add the unary scalar function CEIL.

Analogously to FLOOR,  it rounds up its argument.

- Implement CEIL, add it to the function registry and make sure it is serializable.
- Add csv tests, unit tests and docs.
- Add additional csv tests with different data types and some edge cases for both CEIL and FLOOR
- Add unit tests and update docs for FLOOR.
2023-08-28 12:31:56 +02:00
Nik Everett ff01fb680b
ESQL: Standardize font used in railroad diagrams (#98897)
Locks the railroad diagrams to always use the same font, this one named
`roboto mono`. This makes sure that when we render the railroad diagrams
we always size them the same way. Because everyone has a copy of roboto
mono. Because gradle resolves that dependency.
2023-08-26 14:19:47 -04:00
Nik Everett 649ceb74ab
ESQL docs: generate references for functions (#98856)
This generates a "railroad diagram" svg image that can be embedded into
the docs for any function to explain it's syntax. It's basic, but it's
something we can iterate on.

It also generates a table of supported types from the list of types that
we test. It can be included in the docs for reference as well.
2023-08-25 09:07:25 -04:00
Nik Everett 65ea90d3fd
ESQL: LEAST and GREATEST functions (#98630)
Adds `LEAST` and `GREATEST` functions to find the min or max of the
values in many columns.
2023-08-22 14:15:04 -04:00
Bogdan Pintea 372458c9fd
ESQL: date_trunc(): swap order of arguments (#98624)
Swap arguments order so that the range parameter is first and datetime
one second, inline with other languages.
2023-08-22 18:20:05 +02:00
Nik Everett 44e61341f2
ESQL: COALESCE function (#98542)
This adds a `COALESCE` function that returns the first non-null value.
2023-08-17 13:51:44 -04:00
Nik Everett a380e8c369
ESQL: LTRIM, RTRIM and fix unicode whitespace (#98590)
Here we add support for the following two ESQL functions:
* LTRIM: remove leading spaces from a string
* RTRIM: remove trailing spaces from a string

We also fix an issue with the handling of unicode white spaces. We
make use of unicode code points to identify unicode whitespace
characters instead of relying on ASCII codes.

Moreover, iterating bytes in a Unicode string needs to consider
that some Unicode characters are encoded using multiple bytes.
2023-08-17 11:30:12 -04:00
Craig Taverner aad16b7d6b
Simple ESQL pow() docs fixes after re-reviewing (#98601) 2023-08-17 17:29:06 +02:00
Andrei Stefan 014bd33f45
ESQL: replace the is_null function with IS NULL and IS NOT NULL predicates (#98412) 2023-08-16 20:19:40 +03:00
Kostas Krikellas b498ce9ff4
`Sqrt` function for ESQL (#98449)
* Sqrt function for ESQL

Introduces a unary scalar function for square root, which is a thin
wrapper over the Java.Math implementation.

* Fix area for ESQL integration changelog.

* Restore changelog.

* Restore area in changelog.
2023-08-16 16:33:30 +03:00
Nik Everett 24b2d16f95 Add `to_degrees` and `to_radians` functions (ESQL-1496)
This adds the `to_degrees` and `to_radians` functions. It uses the
"convert" function framework because that just felt right - these
convert between radians and degrees after all.
2023-08-03 04:23:15 +10:00
Nik Everett c1601f5a9c Add remaining trigonometric functions (ESQL-1518)
Adds the remaining trigonomentric functions, `ACOS`, `ASIN`, `ATAN`, and
`ATAN2`.

---------

Co-authored-by: Bogdan Pintea <pintea@mailbox.org>
2023-08-03 01:12:10 +10:00
Nik Everett c44a245cae Add trigonometric functions (ESQL-1513)
This adds `SIN`, `COS`, `TAN`, `SINH`, `COSH`, and `TANH` functions.

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-08-01 14:36:55 -04:00
Nik Everett 6c87075564 Support `auto_bucket` for numeric fields (ESQL-1494)
This adds support for numeric fields to `auto_bucket` and adds a new
`floor` function to round numeric down to the nearest integer. That
function is exposed because it's probably useful. I added it in this PR
because `auto_bucket` uses it as an implementation detail as well.
2023-07-31 16:45:59 -04:00
Craig Taverner 9e566c9068 Update docs/reference/esql/functions/pow.asciidoc
Co-authored-by: Bogdan Pintea <pintea@mailbox.org>
2023-07-24 15:50:13 +02:00
Craig Taverner 75ea3ab3cd Numerical overflow should result in `null` and a warning
To implement this we:

* Cast both arguments to double
* Perform integer and long validation on the double results before casting back to integer or long
* Perform a special case validation for exponent==1
* Any validation failures result in ArithmeticException, which is caught and added to warnings
2023-07-20 19:03:29 +02:00
Craig Taverner 925bdf49a8 Improve documentation for pow function and refined type rules 2023-07-20 11:32:09 +02:00
Luigi Dell'Aquila 95d9fd75ed Add date_extract function (ESQL-1346) 2023-07-19 14:08:06 +02:00
Abdon Pijpelink d204de411b Merge pull request ESQL-1393 from abdonpijpelink/es-pipe-ql
[DOCS] Change ESQL into ES|QL and other docs improvements
2023-07-11 10:07:18 +02:00
Martijn van Groningen b259248568 Reused example from spec file 2023-07-10 10:17:12 +02:00
Martijn van Groningen c406b64058 use ROW in docs and added test with ROWS 2023-07-07 20:57:39 +02:00
Martijn van Groningen 3c3963cc28 Add trim function
This change adds a string `trim` function.
2023-07-07 17:37:38 +02:00
Abdon Pijpelink 68b74bea34 Move IS_NULL, POW, ROUND, STARTS_WITH, SUBSTRING code snippets to CSV files 2023-07-07 15:45:06 +02:00
Mark Tozzi 985b1949cb Log base 10 for ESQL (ESQL-1358)
Introduces a unary scalar function for base 10 log, which is a thin
wrapper over the Java.Math implementation

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-07-06 15:37:35 -04:00
Nik Everett f3b20067a3 Add `PI` and `TAU` functions (ESQL-1357)
Adds functions for the constants `PI` and it's big brother `TAU`.
2023-07-05 16:06:06 -05:00
Bogdan Pintea 48cb069670 Add unsigned_long type support (ESQL-1289)
This adds support for the `unsigned_long` type.
The type can be now used with the defined math function, both scalar and
MV'ed, arithmetic and binary comparison ones.
The `to_unsigned_long()` conversion function is also added.
2023-07-04 15:45:10 +02:00
Nik Everett 1a1941913d Implement `MV_DEDUPE` (ESQL-1287)
This implements the `MV_DEDUPE` function that removes duplicates from
multivalues fields. It wasn't strictly in our list of things we need in
the first release, but I'm grabbing this now because I realized I needed
very similar infrastructure when I was trying to build grouping by
multivalued fields. In fact, I realized that I could use our
stringtemplate code generation to generate most of the complex parts.
This generates the actual body of `MV_DEDUPE`'s implementation and the
body of the `Block` accepting `BlockHash` implementations. It'll be
useful in the final step for grouping by multivalued fields.

I also got pretty curious about whether the `O(n^2)` or `O(n*log(n))`
algorithm for deduplication is faster. I'd been assuming that for all
reasonable sized inputs the `O(n^2)` bubble sort looking selection
algorithm was faster. So I measured it. And it's mostly true - even for
`BytesRef` if you have a dozen entries the selection algorithm is
faster. Lower overhead and stuff. Anyway, to measure it I had to
implement the copy-and-sort `O(n*log(n))` algorithm. So while I was
there I plugged it in and selected it in cases where the number of
inputs is large and the selection alogorithm is likely to be slower.
2023-06-27 08:13:19 -05:00
Luigi Dell'Aquila c2c0b0fa0d Implement now() function (ESQL-1172)
returns current datetime
2023-06-27 12:01:09 +02:00
Nik Everett 35fddc2281 Create e() function (ESQL-1304)
Euler's number.
2023-06-22 10:02:23 -04:00
Luigi Dell'Aquila 100ca0acca Rename PROJECT command to KEEP (ESQL-1282) 2023-06-19 13:06:44 +02:00