elasticsearch

Commit Graph

Author	SHA1	Message	Date
Stef Nestor	c1019d4c5d	(Doc+) Link API doc to parent object - part1 (#111951 ) * (Doc+) Link API to parent Doc part1 --------- Co-authored-by: shainaraskas <shaina.raskas@elastic.co> Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>	2024-08-20 14:58:18 -06:00
Nik Everett	d8e705d5da	ESQL: Document `date` instead of `datetime` (#111985 ) This changes the generated types tables in the docs to say `date` instead of `datetime`. That's the name of the field in Elasticsearch so it's a lot less confusing to call it that. Closes #111650	2024-08-21 01:59:13 +10:00
Iván Cea Fontenla	65ce50c60a	ESQL: Added mv_percentile function (#111749 ) - Added the `mv_percentile(values, percentile)` function - Used as a surrogate in the `percentile(column, percentile)` aggregation - Updated docs to specify that the surrogate _should_ be implemented if possible The same way as mv_median does, this yields exact results (Ignoring double operations error). For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`) Closes https://github.com/elastic/elasticsearch/issues/111591	2024-08-20 15:29:19 +02:00
Iván Cea Fontenla	e3f378ebd2	ESQL: Strings support for MAX and MIN aggregations (#111544 ) Support Version, Keyword and Text in Max an Min aggregations. The current implementation of both max and min does: For non-grouping: - Store a BytesRef - When there's a max/min, copy it to the internal array. Grow it if needed For grouping: - Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string) - Each BytesRef stores their own array, which will be grown as needed to copy the new max/min Some notes: - It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again - It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker? Part of https://github.com/elastic/elasticsearch/issues/110346	2024-08-20 15:24:55 +02:00
Bogdan Pintea	dd49c33479	ESQL: BUCKET: allow numerical spans as whole numbers (#111874 ) This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves #109340, resolves #104646, resolves #105375.	2024-08-20 13:40:59 +02:00
Nik Everett	dc24003540	ESQL: Profile more timing information (#111855 ) This profiles additional timing information for each individual driver. To the results from `profile` it adds the start and stop time for each driver. That was already in the task status. To the profile and task status it also adds the number of times the driver slept and some more detailed history about a few of those times. Explanation time! The compute engine splits work into some number of `Drivers` per node. Each `Driver` is a single threaded entity - it runs on a thread for a while then does one of three things: 1. Finishes 2. Goes async because one of it's `Operator`s has gone async 3. Yields the thread pool because it has run for too long This PR measures the second two. At this point only three operators can go async: * ENRICH * Reading from an empty exchange * Writing to a full exchange We're quite interested the these sleeps at the moment because they think they may be slowing things down. Here's what it looks like when a driver goes async because it wants to read from an empty exchange: ``` ... the rest of the profile ... "sleeps" : { "counts" : { "exchange empty" : 2 }, "first" : [ { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:57.943Z", "sleep_millis" : 1723578357943, "wake" : "2024-08-13T19:45:58.159Z", "wake_millis" : 1723578358159 }, { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:58.164Z", "sleep_millis" : 1723578358164, "wake" : "2024-08-13T19:45:58.165Z", "wake_millis" : 1723578358165 } ], "last": [same as above] ``` Every time the driver goes async we count it in the `counts` map - grouped by the reason the driver slept. We also record the sleep and wake times for the first and last ten times the driver sleeps. In this case it only slept twice, so the `first` and `last` ten times is the same array. This should give us a good sense about why drivers sleep while using a limited amount of memory per driver.	2024-08-20 07:29:01 +10:00
Liam Thompson	d3fdb36852	[DOCS] Fix response value in esql-query-api.asciidoc (#111882 )	2024-08-14 16:26:54 +02:00
Nik Everett	2e22e73cdf	ESQL: Remove date_nanos from generated docs (#111884 ) This removes date_nanos from the docs generated for all of our functions because it's still under construction. I've done so as a sort of one-off hack. My plan is to replace this in a follow up change with a centralized registry of "under construction" data types. So we can make new data types under a feature flag more easilly in the future. We're going to be doing that a fair bit.	2024-08-15 00:22:25 +10:00
Alexander Spies	585480fe44	ESQL: Fix for overzealous validation in case of invalid mapped fields (#111475 ) Fix validation of fields mapped to different types in different indices and align with validation of fields of unsupported type. * Allow using multi-typed fields in KEEP and DROP, just like unsupported fields. * Explicitly invalidate using both these field kinds in RENAME. * Map both kinds of fields to UnsupportedAttribute to enforce consistency. * Consider convert functions containing valid multi-typed fields as resolved to avoid weird workarounds when resolving STATS. * Add a bunch of tests.	2024-08-09 09:38:14 +02:00
Liam Thompson	d3ec3a86ed	[DOCS] Document CCS enrich with api-key based auth (#111682 )	2024-08-07 19:37:16 +02:00
Mark Tozzi	67c69bb224	[ESQL] Date nanos type (#110205 ) Resolves #109987 Add initial support for the date nanos data type. At this point, almost no functions are supported, including casting. This just covers loading and returning the values. Like millisecond dates, nanosecond dates are internally modeled as long values, so we don't need a new block type to support them. This has very patchwork function support. Ideally, I don't think I would have added any function support yet, but the five MV functions you see here declare that they accept any non-spatial type, and will error tests if not wired up for new types. There are other functions, like Values, which also claim to support all non-spatial types, but don't currently enforce that in testing, so I didn't add them yet. Finally, there are functions like == which should work for all types, but are implemented as a specific list. I've left those for a follow up ticket as well.	2024-08-07 13:17:26 -04:00
Nik Everett	cc294a1a0f	ESQL: Finish migration of null testing (#111563 ) This finishes the migration of `null` testing from a test method, namely `testSimpleWithNulls`. It migrates it to `anyNullIsNull` and hand rolled null cases.	2024-08-05 12:28:15 -04:00
Fang Xing	d87254369a	type = operator in kibana operator definition (#111436 )	2024-07-31 11:07:18 -04:00
Pablo Machado	f79c62157d	ESQL: Add `MV_PSERIES_WEIGHTED_SUM` for score calculations used by security solution (#109017 ) * Create MV_RIEMANN_ZETA scalar multivalue function --------- Co-authored-by: Nik Everett <nik9000@gmail.com>	2024-07-31 12:08:28 +02:00
Iván Cea Fontenla	bc69827e1e	ESQL: WEIGHTED_AVG aggregation tests and docs (#111449 )	2024-07-31 00:42:23 +10:00
Iván Cea Fontenla	735d80dffd	ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409 )	2024-07-30 03:07:15 +10:00
Iván Cea Fontenla	826d49448b	ESQL: Added Median and MedianAbsoluteDeviation aggregations tests and kibana docs (#111231 )	2024-07-26 22:11:01 +10:00
Iván Cea Fontenla	595d907f61	ESQL: SpatialCentroid aggregation tests and docs (#111236 )	2024-07-26 10:41:18 +02:00
Alexander Spies	5cac9a0b7f	ESQL: Mark union types as experimental (#111297 )	2024-07-26 10:20:21 +02:00
Nik Everett	b5c6c2da30	ESQL: INLINESTATS (#109583 ) This implements `INLINESTATS`. Most of the heavy lifting is done by `LOOKUP`, with this change mostly adding a new abstraction to logical plans, and interface I'm calling `Phased`. Implementing this interface allows a logical plan node to cut the query into phases. `INLINESTATS` implements it by asking for a "first phase" that's the same query, up to `INLINESTATS`, but with `INLINESTATS` replaced with `STATS`. The next phase replaces the `INLINESTATS` with a `LOOKUP` on the results of the first phase. So, this query: ``` FROM foo \| EVAL bar = a * b \| INLINESTATS m = MAX(bar) BY b \| WHERE m = bar \| LIMIT 1 ``` gets split into ``` FROM foo \| EVAL bar = a * b \| STATS m = MAX(bar) BY b ``` followed by ``` FROM foo \| EVAL bar = a * b \| LOOKUP (results of m = MAX(bar) BY b) ON b \| WHERE m = bar \| LIMIT 1 ```	2024-07-24 17:16:37 -04:00
Fang Xing	686c96f372	docs for named and positional parameters (#111178 )	2024-07-23 08:27:34 -04:00
Fang Xing	66dd2687d5	[ES\|QL] Generate docs for unregistered esql functions from annotations (#108749 ) * render docs for operators	2024-07-22 14:58:17 -04:00
Iván Cea Fontenla	195b916e2b	ESQL: TOP aggregation IP support (#111105 ) Added IP support to TOP() aggregation. Adapted a bit the stringtemplates organization for esql/compute to (also?) work with specific datatypes. Right now it may be a bit messy, but we need the specific support for cases like this.	2024-07-22 22:35:48 +10:00
Iván Cea Fontenla	101775b93d	Added Sum aggregation tests and docs (#110984 ) - Added SUM() agg tests (Which autogenerates docs) - Converted non-finite doubles to nulls in aggregator The complete set of tests depends on https://github.com/elastic/elasticsearch/issues/110437, as commented in code. After completion, the test can be uncommented and everything should work fine	2024-07-22 21:43:58 +10:00
Iván Cea Fontenla	96e1b15b9d	ESQL: Support IP fields in MAX and MIN aggregations (#110921 ) - Support IP in MAX() and MIN() - Used a custom IpArrayState for it, as it's quite different from the `X-ArrayState.java.st` generated ones - Add IP test cases for aggregation tests	2024-07-19 23:23:13 +10:00
Iván Cea Fontenla	0e68117935	Added Percentile aggregation tests and Kibana docs (#111050 ) - Added Percentile aggregation tests and autogen docs - Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.	2024-07-19 14:28:11 +02:00
Alexander Spies	da5392134f	ESQL: Validate unique plan attribute names (#110488 ) * Enforce an invariant in our dependency checker so that logical plans never have duplicate output attribute names or ids. * Fix ROW to not produce columns with duplicate names. * Fix ResolveUnionTypes to not create multiple synthetic field attributes for the same union type. * Add tests for commands using the same column name more than once. * Update docs w.r.t. how commands behave if they are used with duplicate column names.	2024-07-17 11:39:02 +02:00
Carlos Delgado	453b82706d	Add the EXP ES\|QL function (#110879 )	2024-07-16 16:36:01 +02:00
Craig Taverner	1d6f1a0223	Union types documentation (#110183 ) * Union types documentation * Try remove asciidoc error * Another attempt * Using literal block * Nicer formatting * Remove partintro * Small refinements * Edits for clarity and style --------- Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>	2024-07-16 12:06:19 +02:00
Nhat Nguyen	04845342f4	Fork field-caps for ES\|QL (#110738 ) We need to fork the field-caps API for ES\|QL to allow changes to the new internal API without risking breaking the external field-caps API.	2024-07-15 17:21:16 -07:00
Iván Cea Fontenla	43a3af66e8	ESQL: Add boolean support to TOP aggregation (#110718 ) - Added a custom implementation of BooleanBucketedSort to keep the top booleans - Added boolean aggregator to TOP - Added tests (Boolean aggregator tests, Top tests for boolean, and added boolean fields to CSV cases)	2024-07-16 03:14:29 +10:00
Nik Everett	9f001169c6	ESQL: Document the pattern to count TRUE (#110820 ) This adds an example to the docs an example of counting the TRUE results of an expression. You do `COUNT(a > 0 OR NULL)`. That turns the `FALSE` into `NULL`. Which you need to do because `COUNT(false)` is `1` - because it's a value. But `COUNT(null)` is `0` - because it's the absence of values. We could like to make something more intuitive for this one day. But for now, this is what works.	2024-07-12 14:08:22 -04:00
Nik Everett	55532c8d6f	ESQL: All descriptions are a full sentence (#110791 ) This asserts that all functions have descriptions that are complete sentences.	2024-07-11 16:44:15 -04:00
Nik Everett	1256a49c3a	ESQL: Move description of commands in docs (#110714 ) This copies the first line of the description of each command to just under the syntax so that it's "in order", before the `Parameters` section. That way if you are reading from top to bottom you see: ``` syntax short description parameter names and descriptions long description examples ``` I've also removed the `Description` section entirely if the description was just one sentence. So in some cases that just isn't `long description`.	2024-07-11 08:31:35 -04:00
Nik Everett	8f93bd00f9	ESQL: Document the `profile` option (#110727 ) This adds some basic documentation for the `profile` option in ESQL but doesn't really explain the results beyond "this is for human debugging." We're not ready for any kind of specification for this thing, but it is useful to look at.	2024-07-11 22:20:31 +10:00
Nik Everett	a1695ffbea	ESQL: Documents STATS on multivalue groups (#110712 ) This documents running `STATS` on a multivalued column. It also removes a long out of date warning about a limitation of grouping.	2024-07-10 15:49:46 -04:00
Iván Cea Fontenla	2901711c46	ESQL: Add boolean support to Max and Min aggs (#110527 ) - Added support for Booleans on Max and Min - Added some helper methods to BitArray (`set(index, value)` and `fill(from, to, value)`). This way, the container is more similar to other BigArrays, and it's easier to work with Part of https://github.com/elastic/elasticsearch/issues/110346, as Max and Min are dependencies of Top.	2024-07-10 23:10:32 +10:00
Iván Cea Fontenla	5d3512fb33	ESQL: Fix Max doubles bug with negatives and add tests for Max and Min (#110586 ) `MAX()` currently doesn't work with doubles smaller than `Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest non-zero positive, not the smallest double). This PR adds tests for Max and Min, and fixes the bug (Detected by the tests). Also, as the tests now generate the docs, replaced the old docs with the generated ones, and updated the Max&Min examples.	2024-07-09 21:05:00 +10:00
Iván Cea Fontenla	38cd0b333e	ESQL: AVG aggregation tests and ignore complex surrogates (#110579 ) Some work around aggregation tests, with AVG as an example: - Added tests and autogenerated docs for AVG - As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests. The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)	2024-07-09 12:01:46 +02:00
Sylvain Wallez	e78bdc953a	ESQL: add Arrow dataframes output format (#109873 ) Initial support for Apache Arrow's streaming format as a response for ES\|QL. It triggers based on the Accept header or the format request parameter. Arrow has implementations in every mainstream language and is a backend of the Python Pandas library, which is extremely popular among data scientists and data analysts. Arrow's streaming format has also become the de facto standard for dataframe interchange. It is an efficient binary format that allows zero-cost deserialization by adding data access wrappers on top of memory buffers received from the network. This PR builds on the experiment made by @nik9000 in PR #104877 Features/limitations: - all ES\|QL data types are supported - multi-valued fields are not supported - fields of type _source are output as JSON text in a varchar array. In a future iteration we may want to offer the choice of the more efficient CBOR and SMILE formats. Technical details: Arrow comes with its own memory management to handle vectors with direct memory, reference counting, etc. We don't want to use this as it conflicts with Elasticsearch's own memory management. We therefore use the Arrow library only for the metadata objects describing the dataframe schema and the structure of the streaming format. The Arrow vector data is produced directly from ES\|QL blocks. --------- Co-authored-by: Nik Everett <nik9000@gmail.com>	2024-07-03 10:29:57 +02:00
Fang Xing	8abc8857f2	[ES\|QL] weighted_avg (#109993 ) * weighted_avg	2024-07-02 18:29:02 -04:00
Nik Everett	6fbc52d170	ESQL docs: Push down needs index and doc_values (#110353 ) This adds a `NOTE` to each comparison saying that pushing the comparison to the search index requires that the field have an `index` and `doc_values`. This is unique compared to the rest of Elasticsearch which only requires an `index` and it's caused by our insistence that comparisons only return true for single-valued fields. We can in future accelerate comparisons without `doc_values`, but we just haven't written that code yet.	2024-07-02 14:22:50 -04:00
Iván Cea Fontenla	c89ee3b648	ESQL: Renamed TopList to Top (#110347 ) Rename TopList aggregation to Top, after internal discussions	2024-07-02 03:52:24 +10:00
Costin Leau	b906ce3d66	ESQL: change from quoting from backtick to quote (#108395 ) * ESQL: change from quoting from backtick to quote For historical reasons, the source declaration inside FROM command is treated as an identifier, using backticks (`) for escaping the value. This is inconsistent since the source is not an identifier (field name) but an index name which has different semantics. `index` means a field name index while "index" means a literal with said value. In case of FROM, the index name/location is more like a literal (also in unquoted form) than an identifier (that is a reference to a value). This PR tweaks the grammar and plugs in the quoted string logic so that both the single quote (") and triple quote (""") are allowed. * Update grammar * Add more tests * Add a few more tests * Add extra test * Update docs/changelog/108395.yaml * Adress review comments * Add doc note * Revert test rename * Fix quoting with remote cluster * Update docs/reference/esql/source-commands/from.asciidoc Co-authored-by: marciw <333176+marciw@users.noreply.github.com> --------- Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co> Co-authored-by: Bogdan Pintea <pintea@mailbox.org> Co-authored-by: marciw <333176+marciw@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-06-30 20:01:31 +03:00
Iván Cea Fontenla	fc0313f429	ESQL: Add aggregations testing base and docs (#110042 ) - Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation. - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable - Added a `TopListTests` example - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_ - Adapted Kibana docs to use `type: "agg"` (@drewdaemon) The current tests are very basic: Consume a page, generate an output, all in Single aggregation mode (No intermediates, no grouping). More complex testing will be added in future PRs Initial PR of https://github.com/elastic/elasticsearch/issues/109917	2024-06-27 21:21:55 +10:00
Craig Taverner	536d614694	ES\|QL ST_DISTANCE Function (#108764 ) * WIP Started refactoring in preparation for ST_DISTANCE * Initial evaluators for ST_DISTANCE * Update docs/changelog/108764.yaml * Fix invalid changelog generated by CI * Register function and get unit tests working * Fixed failing meta function description tests, and refined descriptions * Added initial CsvTests and calculate Geo differently to Cartesian * Added more csv-spec tests and changed to arcDistance for accuracy * Added generated docs files * Link to generated docs * Fix examples tag for linking from generated docs * Skip wrapper function And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class * Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN * Code style * Added more tests, this time for sorting on distance * Fixes after rebase on main * The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases. The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests. We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types. * Fixed function count after rebasing on main * Update docs/changelog/108764.yaml * Added generated docs for ST_DWITHIN * Connect docs for ST_DWITHIN * Add back issue link * Remove support for ST_DWITHIN * Update docs/changelog/108764.yaml * Bring back link to issue in changelog * Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java Co-authored-by: Ignacio Vera <iverase@gmail.com> * Revert reformatting of function descriptions We should put this into a separate PR * Github merged commit with incorrectly formatted whitespace --------- Co-authored-by: Ignacio Vera <iverase@gmail.com>	2024-06-21 11:59:44 +02:00
Nik Everett	b35f0ed48d	ESQL: Make a table of all inline casts (#109713 ) This adds a test that generates `docs/reference/esql/functions/kibana/inline_cast.json` which is a json object who's keys are the names of valid inline casts and who's values are the resulting data types. I also moved one of the maps we use to make the inline casts to `DataType`, which is a place where we want it.	2024-06-18 06:23:11 -04:00
Nik Everett	2aade9dd66	ESQL: Warn about division (#109716 ) When you divide two integers or two longs we round towards 0. Like Postgres or Java or Rust or C. Other systems, like MySQL or SPL or Javascript or Python always produce a floating point number. We should warn folks about this. It's genuinely unexpected for some folks. OTOH, converting into a floating point number would be unexpected for other folks. Oh well, let's document what we've got.	2024-06-14 08:36:27 -04:00
Luigi Dell'Aquila	47edae4fbd	ES\|QL: reduce memory footprint for MvAppendTests with shapes (#109517 ) Fixing MvAppendTests CB exceptions by generating smaller geometries: the test generates a lot of documents and the CB is too small for multiple big shapes. Fixes https://github.com/elastic/elasticsearch/issues/109409	2024-06-13 02:44:49 +10:00
Liam Thompson	394d2b09a6	Revert "[DOCS] Remove ESQL demo env link from 8.14+ (#109562 )" (#109579 ) This reverts commit `0480c1acba`.	2024-06-11 17:04:37 +02:00

1 2 3 4 5 ...

333 Commits