* Inline cast to date
* Update docs/changelog/123460.yaml
* New capability for `::date` casting
* More tests
* Update tests
---------
Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>
Resolves#123053
This adds the thread name to the driver sleep profile output.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Adds options to QSTR function.
#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL.
Closes#120933
The original work at https://github.com/elastic/elasticsearch/pull/106065 did not support geospatial types with this comment:
> I made this work for everything but geo_point and cartesian_point because I'm not 100% sure how to integrate with those. We can grab those in a follow up.
The geospatial types should be possible to collect using the VALUES aggregation with similar behavior to the `ST_COLLECT` OGC function, based on the Elasticsearch convention that treats multi-value geospatial fields as behaving similarly to any geometry collection. So this implementation is a trivial addition to the existing values types support.
* Allow data stream reindex tasks to be re-run after completion
* Docs update
* Update docs/reference/migration/apis/data-stream-reindex.asciidoc
Co-authored-by: Keith Massey <keith.massey@elastic.co>
---------
Co-authored-by: Keith Massey <keith.massey@elastic.co>
* Allow setting the `type` in the reroute processor
This allows configuring the `type` from within the ingest `reroute` processor. Similar to `dataset`
and `namespace`, the type defaults to the value extracted from the index name. This means that
documents sent to `logs-mysql.access.default` will have a default value of `logs` for the type.
Resolves#121553
* Update docs/changelog/122409.yaml
This updates the kibana signature json files in two ways:
* Renames `eval` to `scalar` - that's the name we use inside of ESQL and
we may as well make the name the same.
* Calls the `CATEGORIZE` and `BUCKET` function `grouping` because they
can only be used in the "grouping" positions of the `STATS` command.
Closes#113411
The semantic text format was updated in #119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
Today, Elasticsearch supports two models to establish secure connections
and trust between two Elasticsearch clusters:
- API key based security model
- Certificate based security model
This PR deprecates the _Certificate based security model_ in favour of *API key based security model*.
The _API key based security model_ is preferred way to configure remote clusters,
as it allows to follow security best practices when setting up remote cluster connections
and defining fine-grained access control.
Users are encouraged to migrate remote clusters from certificate to API key authentication.
* (Doc+) Expand watermark resolution
Relaunch https://github.com/elastic/elasticsearch/pull/116892 since the original one seems to be outdated and hard to update branch.
* Apply suggestions from code review
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
---------
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
API keys are high-entropy secure random strings. This means that the
additional work factor of functions like PBKDF or bcrypt are not
necessary, and a faster hash function like salted SHA-256 provides
adequate security against offline attacks (hash collision, brute force,
etc.).
This PR adds `SSHA-256` to the list of supported stored hash algorithms
for API key secrets, and makes it the default algorithm. Additionally,
this PR changes the format of API key secrets, moving from an encoded
UUID to a random string which increase the entropy of API keys from 122
bits to 128 bits, without changing overall secret length.
Relates: ES-9504
With the introduction of our new backing algorithm and making rescoring
easier with the `rescore_vector` API, let's mark bbq as GA.
Additionally, this commit adds rolling upgrade tests to ensure
stability.
Building off of `stats` and multi-value aggregations, including the
limitation:
- all values of extended_stats will be mapped to `double` if mapping
deduction is used
Relates #51925
Enable logsdb by default if logsdb.prior_logs_usage has not been set to true.
Meaning that if no data streams were created matching with the logs-- pattern in 8.x, then logsdb will be enabled by default for data streams matching with logs-*-* pattern.
Also removes LogsPatternUsageService as with version 9.0 and beyond, this component is no longer necessary.
Followup from #120708Closes#106489
* (Doc+) Clarify dimension field requirements for time_series aggregation
👋 howdy, team!
This PR adds a note explaining that time series indices require:
- index.mode set to "time_series"
- at least one dimension field with time_series_dimension: true
- a routing_path array listing those dimension fields
Without these settings, the time_series aggregation may return empty buckets or behave unexpectedly. By emphasizing the dimension field requirement, we help users configure their time series indices correctly and see meaningful aggregation results.
* Apply suggestions from code review
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Add documentation for new REST endpoints related to data stream upgrade.
Endpoints:
- /_migration/reindex
- /_migration/reindex/{index}/_status
- /_migration/reindex/{index}/_cancel
- /_create_from/{source}/{dest}
There are many features of the Elasticsearch ecosystem that may malfunction, or fail to work entirely, if these templates are not installed. This commit adds documentation cautioning against disabling the installation of templates.
* [DOCS] Update documentation for index sorting and routing for logsdb
* update
* Apply suggestions from code review
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
* Update logs.asciidoc
* Update docs/reference/data-streams/logs.asciidoc
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
* Update logs.asciidoc
---------
Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
Resolves#109999
This adds support for date nanos in the date diff function, as well as mixed nanos/millis use cases.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Add capability to stop async query on demand
The theory:
- User initiates async search request
- User sends the stop request (POST _query/async/<ID>/stop)
- If the async is finished by that time, it's like regular async get
- If it's not finished, the sinks are closed and the request is forcefully finished
* ESQL: Signatures for `NOT IN` et al
This generates signatures for `NOT IN`, `NOT LIKE`, and `NOT RLIKE`
using a small hack on top of the process used to generate the signatures
for `IN`, `LIKE`, and `RLIKE`. This is a very perl-worth hack, replacing
`LIKE` with `NOT LIKE` in the description. But it's useful for our
kibana friends and if we need to make it nicer we can do so later.
* Zap
Resolve/cluster allows querying for cluster-info-only (no index expression required)
This enhancement provides users with the ability to query the _resolve/cluster API endpoint without specifying
an index expression to match against. This allows users to quickly test what remote clusters are configured on
a cluster and whether they are available for querying.
The new endpoint takes no index expression:
```
GET _resolve/cluster
```
and returns the same information as before except for the "matching_indices" field. Example response:
```
{
"remote1": {
"connected": false,
"skip_unavailable": true
},
"remote2": {
"connected": true,
"skip_unavailable": false,
"version": {
"number": "8.17.0",
"build_flavor": "default",
"minimum_wire_compatibility_version": "7.17.0",
"minimum_index_compatibility_version": "7.0.0"
}
}
}
```
For backwards compatibility, this new endpoint works with clusters from older versions by querying with the index expression `dummy*` on those older clusters and ignoring the matching_indices value in the response they return.
This deprecated feature is being removed in 9.0, so the telemetry is
no longer needed.
The usage action is retained to support mixed v8/v9 clusters, with
annotations to remove in V10. But it is no longer registered in
`XPackUsageFeatureAction.ALL` and so the usage data is no longer
reported by `GET _xpack/usage`, and if invoked it always returns a
count of 0.
ES-9736 # comment Removed the telemetry in https://github.com/elastic/elasticsearch/pull/119890
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.
Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
There were different error cases with `ROUND(number, decimals)`:
- Decimals accepted unsigned longs, but threw a 500 with a `can't process [unsigned_long -> long]` in the cast evaluator
- Fixed by improving the `resolveType()`
- If the number was a BigInteger unsigned long, there were 2 cases throwing an exception:
1. Negative decimals outside the range of integer: Error
2. Negative decimals insie the range of integer, but "big enough" for `BigInteger.TEN.pow(...)` to throw a `BigInteger would overflow supported range`
3. -19 decimals with big unsigned longs like `18446744073709551615` was throwing an `unsigned_long overflow`
Also, when the number is a BigInteger and the decimals is a big negative (but not big enough to throw), it may be **very** slow. Taking _many_ seconds for a single computation (It tries to calculate a `10^(big number)`. I didn't do anything here, but I wonder if we should limit it.
To solve most of the cases, a warnExceptions was added for the overflow case, and a guard clause to return 0 for <-19 decimals on unsigned longs.
Another issue is that rounding to a number like 7 to -1 returns 0 instead of 10, which may be considered an error. But it's consistent, so I'm leaving it to another PR
This reduces the number of test cases in ESQL a little more ala #119678.
It migrates a few random tests and all of the multivalue functions:
```
92775 -> 43760
3m45 -> 4m04
```
This adds a few more error test cases that were missing to make sure it all
lines up well. And it fixes a few error messages in a few functions. That's
*likely* where the extra time goes.
* Added additional entries for troubleshooting unhealthy cluster
Reordered "Re-enable shard allocation" because not as common as other causes
Added additional causes of yellow statuses
Changed watermark commadn to include high and low watermark so users can make their cluster operate once again.
* Drive-by copyedit with suggestions for concision and some formatting fixes.
* Concision and some formatting fixes.
* Colon added
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Title change
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Spelling fix
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
* Update docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc
---------
Co-authored-by: Kofi B <seanziee@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
This adds support for passing Date Nanos into the Date Format function. It works for both the single argument and two argument versions. Format strings are unchanged, as the same formatting logic works for both resolutions.
resolves#109994
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
I forgot to link the ToDateNanos docs when I merged that function.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This adds a sentence to `redirects.asciidoc` explaining what frozen
indices were - otherwise, everything will point to the message about
the unfreeze API having gone away, which is not very helpful. Some
cross-references are updated to point to this rather than to the
notice about the removal of the unfreeze API.
ES-9736 #comment Removed `_unfreeze` REST endpoint in https://github.com/elastic/elasticsearch/pull/119227
`fold` can be surprisingly heavy! The maximally efficient/paranoid thing
would be to fold each expression one time, in the constant folding rule,
and then store the result as a `Literal`. But this PR doesn't do that
because it's a big change. Instead, it creates the infrastructure for
tracking memory usage for folding as plugs it into as many places as
possible. That's not perfect, but it's better.
This infrastructure limit the allocations of fold similar to the
`CircuitBreaker` infrastructure we use for values, but it's different
in a critical way: you don't manually free any of the values. This is
important because the plan itself isn't `Releasable`, which is required
when using a real CircuitBreaker. We could have tried to make the plan
releasable, but that'd be a huge change.
Right now there's a single limit of 5% of heap per query. We create the
limit at the start of query planning and use it throughout planning.
There are about 40 places that don't yet use it. We should get them
plugged in as quick as we can manage. After that, we should look to the
maximally efficient/paranoid thing that I mentioned about waiting for
constant folding. That's an even bigger change, one I'm not equipped
to make on my own.
* Including examples
* Using js instead of json
* Adding unified docs to main page
* Adding missing description text
* Refactoring to remove unified route
* Addign back references to the _unified route
* Update docs/reference/inference/chat-completion-inference.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Address feedback
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
This wires up the randomized testing for DateFormat. Prior to this PR, none of the randomized testing was hitting the one parameter version of the function, so I wired that up as well. This required some compromises on the type signatures, see comments in line.less
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
As a drive-by, this removes another usage of the trappy default master
node timeout.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
As a drive-by, this removes another usage of the trappy default master
node timeout.
Fixes two bugs in _resolve/cluster.
First, the code that detects older clusters versions and does a fallback to the _resolve/index
endpoint was using an outdated string match for error detection. That has been adjusted.
Second, upon security exceptions, the _resolve/cluster endpoint was marking the clusters as connected: true,
under the assumption that all security exceptions related to cross cluster calls and remote index access were
coming from the remote cluster, but that is not always the case. Some cross-cluster security violations can
be detected on the local querying cluster after issuing the remoteClient.execute call but before the transport
layer actually sends the request remotely. So we now mark the connected status as false for all ElasticsearchSecurityException cases. End user docs have been updated with this information.
* [Connector API] Add interface for soft-deletes
* Define connector deleted system index
* Got soft-delete logic working
* Add unit tests
* Add yaml e2e test and attempt to update permissions
* Fix permissions
* Update docs
* Fix docs
* Update docs/changelog/118282.yaml
* Change logic
* Fix tests
* Remove unnecessary privilege from yaml rest test
* Update changelog
* Update docs/changelog/118669.yaml
* Adapt yaml tests
* Undo changes to muted-tests.yml
* Fix compilation issue after other PR got merged
* Exclude soft-deleted connector from checks about index_name already in use
* Update docs/reference/connector/apis/get-connector-api.asciidoc
Co-authored-by: Tim Grein <tim@4greins.de>
* Update rest-api-spec/src/main/resources/rest-api-spec/api/connector.list.json
Co-authored-by: Tim Grein <tim@4greins.de>
* Adapt comments, add connector wire serializing test
* Introduce new transport versions for passing the delete flag
* Get rid of wire serialisation, use include_deleted instead of deleted flag
* Remove unused import
* Final tweaks
* Adapt variable name in rest layer
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Tim Grein <tim@4greins.de>
I believe this is a typo, as in our 8.16.1 cluster this field appears to be called `combined_coordinating_and_primary`
Co-authored-by: Ian Lee <IanLee1521@gmail.com>
This PR introduces a new syntactical feature to index expression resolution: The selector.
Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of
an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a
concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We
define a component of an index abstraction to be some searchable unit of the index abstraction.
This exposes new OTel node and index based metrics for indexing failures due to version conflicts.
In addition, the /_cat/shards, /_cat/indices and /_cat/nodes APIs also expose the same metric, under the newly added column iifvc.
Relates: #107601
When originally added, the knn query didn't apply `top-k` restrictions
to the query. Instead it would allow the resulting `num_candidate` to be
combined with sibling queries without restricting to `top-size` results
ahead of time.
This honestly is confusing behavior and leads to some bugs in understand
how it all works.
This commit addresses this by eagerly gathering only `size` results when
`k==null` before combining with other queries.
To achieve the previous behavior, this can be done directly by setting
`k==num_candidates` in the query.
Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.
This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models.
For example, this is how you would use it via the API:
```
PUT index
{
"mappings": {
"properties": {
"late_interaction_vectors": {
"type": "rank_vectors"
}
}
}
}
```
Then to index:
```
POST index/_doc
{
"late_interaction_vectors": [[0.1, ...],...]
}
```
For querying, scoring can be exposed with scripting:
```
POST index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "maxSimDotProduct(params.query_vector, 'my_vector')",
"params": {
"query_vector": [[0.42, ...], ...]
}
}
}
}
}
```
Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.
Shutdown metadata is keyed on node id. This makes sense since only one
node with a given node id can exist within a cluster. However, it is
possible that shutdown was initiated for one instance of a node, but
that node is restarted. This commit adds the ephemeral node id to
shutdown metadata so that nodes with the same id but different ephemeral
id can be distinguished.
* (Doc+) Enrich run on ingest+data nodes not coordinating-only
👋 howdy, team! I'm not otherwise finding it so documenting https://github.com/elastic/elasticsearch/issues/95969 in ES docs
> Currently we tell users of enrich that they should co-locate the nodes that perform the enrichment (ingest nodes) with the actual enrich data so that enrich operations don't require a remote search operation.
* feedback
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
---------
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
The `?local` parameter becomes a no-op and is marked as deprecated.
Relates #101805
Relates #107984
This ensures the usage API docs tests are passing again. We achieve this
by: 1. ignoring the contents of `inference.models` because the models
might not yet have been initialized and 2. adding missing fields to the
`logsdb` usage.
This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.
All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.
<details>
<summary>Example:</summary>
```yaml
- do:
search:
index: test
body:
retriever:
rescorer:
rescore:
window_size: 10
query:
rescore_query:
rank_feature:
field: "features.second_stage"
linear: { }
query_weight: 0
retriever:
standard:
query:
rank_feature:
field: "features.first_stage"
linear: { }
size: 2
```
</details>
Closes#118327
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Replace 'ent-search-generic' with 'search-default' pipeline
* missed one
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
#118443 added a new index version for indices that can be opened in read-only mode by Lucene. This change adds this information to the discovery node's VersionInformation and the transport serialization logic.
In a short future we'd like to use this information in methods like IndexMetadataVerifier#checkSupportedVersion and NodeJoineExecutor to allow opening indices in N-2 versions as read-only indices on ES V9.
Previously, Kibana was authorizing (and granting application privileges)
to create reports, simply based on the `reporting_user` role name. This
PR makes these application privileges explicitly granted to the
`reporting_user` role.
This adds support for running the bucket function over a date nanos field. Code wise, this just delegates to DateTrunc, which already supports date nanos, so most of the PR is just tests and the auto-generated docs.
Resolves#118031
This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT.
This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values.
We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior.
Fixes#104659.
Make the dependency checker for query plans take into account binary plans and make sure that fields required from the left hand side are actually obtained from there (and analogously for the right).
Enhance documenation to explain that "_index_prefix" subfield must
be added to `matched_fields` param for highlighting a main field.
When doing prefix queries on fields that are indexed with prefixes,
"_index_prefix" subfield is used. If we try to highlight the main
field, we may not get any results. "_index_prefix" subfield must
be added to `matched_fields` which instructs ES to use matches
from "_index_prefix" to highlight the main field.
Removes the old `_knn_search` API that was never out of tech preview and
deprecated throughout the v8 cycle.
To utilize the API, `compatible-with=8` can be utilized.
We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.
Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
I've updated the listener for GET /_query/async/{id} to EsqlResponseListener, so it now accepts parameters (delimiter, drop_null_columns and format) like the POST /_query API. Additionally, I have added tests to verify the correctness of the code.
You can now set the format in the request parameters to specify the return style.
Closes#110926
* Adds new default inference information
* Update docs/reference/mapping/types/semantic-text.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/mapping/types/semantic-text.asciidoc
Co-authored-by: David Kyle <david.kyle@elastic.co>
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.
In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
Resolves#116281
Introduces support for comparing millisecond dates with nanosecond dates, without the need for casting. Millisecond dates outside of the nanosecond date range are handled correctly.
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
This will make `TransportLocalClusterStateAction` wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param `master_timeout` for
this timeout as well.
The only class that was using `TransportLocalClusterStateAction` was
`TransportGetAliasesAction`, so its request needed to accept a timeout
again as well.
Resolves#109995
This adds support and tests for addition and subtraction of date nanos with periods and durations. It does not include support for date_diff, which is a separate ticket (#109999). The bulk of the PR is testing, the actual date math is all handled by library functions.
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The version of the AWS Java SDK we use already magically switches to
IMDSv2 if available, but today we cannot claim to support IMDSv2 in
Elasticsearch since we have no tests demonstrating that the magic really
works for us. In particular, this sort of thing often risks falling foul
of some restrictions imposed by the security manager (if not now then
maybe in some future release).
This commit adds proper support for IMDSv2 by enhancing the test suite
to add the missing coverage to avoid any risk of breaking this magical
SDK behaviour in future.
Closes#105135 Closes ES-9984
Re-implement `CATEGORIZE` in a way that works for multi-node clusters.
This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized.
BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes).
Therefore, we re-implement `CATEGORIZE` as a special BlockHash.
To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings.
**Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall.
I goofed on the bit * byte and bit * float comparisons. Naturally, these
should be bigendian and compare the dimensions with the binary ones
appropriately.
Additionally, I added a test to ensure that this is handled correctly.
* Add timezone support to Cron objects
* Add timezone support to CronnableSchedule
* XContent change to support parsing and display of TimeZone fields on schedules
* Case insensitive timezone parsing
* Doc changes
* YAML REST tests
* Equals, toString and HashCode now include timezone
* Additional random testing for DST transitions
* Migrate Cron class to use wrapped LocalDateTime
The algorithm depends on some quirks of calendar but LocalDateTime
correctly ignores DST during calculations so this uses a LocalDateTime
with a wrapper to emulate some of Calendar's behaviours that the Cron
algorithm depends on
* Additional documentation to explain discontinuity event behaviour
* Remove redundant conversions from ZoneId to TimeZone following move to LocalDateTime
* Add documentation warning that manual clock changes will cause unpredictable watch execution
* Update docs/reference/watcher/trigger/schedule.asciidoc
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
---------
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This enables date nanos support as tech preview. Basic operations, like reading values, binary comparisons, and functions that don't care about type should work, but some functions are not yet supported. Most notably, Bucket is not yet supported, although Date_Trunc is and can be used for grouping. See the docs for the full list of limitations.
relates to #109352
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details).
Co-authored-by: Nik Everett <nik9000@users.noreply.github.com>