Add the unary scalar function CEIL.
Analogously to FLOOR, it rounds up its argument.
- Implement CEIL, add it to the function registry and make sure it is serializable.
- Add csv tests, unit tests and docs.
- Add additional csv tests with different data types and some edge cases for both CEIL and FLOOR
- Add unit tests and update docs for FLOOR.
Locks the railroad diagrams to always use the same font, this one named
`roboto mono`. This makes sure that when we render the railroad diagrams
we always size them the same way. Because everyone has a copy of roboto
mono. Because gradle resolves that dependency.
This generates a "railroad diagram" svg image that can be embedded into
the docs for any function to explain it's syntax. It's basic, but it's
something we can iterate on.
It also generates a table of supported types from the list of types that
we test. It can be included in the docs for reference as well.
* New docs structure for remote clusters
* Fix broken cross-book link errors
* More broken cross-book link errors
* Remove redirects for new pages
* Link to generic remote cluster docs instead
* Drop 'API' from the abbreviated title
* Add 'Establish trust with a remote cluster' section
* Restructure 'Establish trust' section into Prprequisite/local/remote instructions
* Add 'Configure roles and users' section
* Add 'Connect to a remote cluster' section
* Move version compatibility to prerequisites
* Fix test errors
* Incorporate review feedback
* Mention version 8.10 or later in the intro for API keys
* Add license prerequisite
Currently the `GET target/_lifecycle/explain` API only works for
indices. In this PR we extend this behaviour to allow the target to be a
data stream so we can get the overview lifecycle status for all the
backing indices of a data stream.
Report node "roles" in the /_cluster/allocation/explain response.
Nodes with limited sets of roles may affect shard distribution in ways
users did not originally consider, so it is helpful to surface this
information along with node allocation decision explanations.
* Add 'dataset' size to cat indices and cat shards
This adds the `dataset` computed size for the `/_cat/indices` and `/_cat/shards` APIs. This new
column is reported by default.
Resolves#95092
This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
Here we add support for the following two ESQL functions:
* LTRIM: remove leading spaces from a string
* RTRIM: remove trailing spaces from a string
We also fix an issue with the handling of unicode white spaces. We
make use of unicode code points to identify unicode whitespace
characters instead of relying on ASCII codes.
Moreover, iterating bytes in a Unicode string needs to consider
that some Unicode characters are encoded using multiple bytes.
* First version
* Spotless, I liked my version better
* Fix param default values
* Add a supplier for default value to ensure it's calculated correctly
* Can't improve this without breaking tests
* Added checks for not specifying a body in PUT requests
* Fix default provider for enum params
* Added yaml test
* Changed docs and fix TODO
* Removing synonyms changes
* Added separate methods for providing default value as suppliers in enums
* Fixed test
* Add a supplier for default value to ensure it's calculated correctly
* Added checks for not specifying a body in PUT requests
* Remove synonyms changes
* Remove some supplier changes
* Better call enumParam with supplier version
* Fix compiler error on supplier
* Apply validators or requires depending on index version
* Solved BWC tests that involved using validators instead of requiresParameters
* Add tests
* Spotless
* Update docs/changelog/98268.yaml
* Update changelog
* Update docs/changelog/98268.yaml
* PR comments
* PR feedback
* Serialize index only for new index versions
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Change example field in rule query guide
* Change fuzzy to contains to get tests to work
---------
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
* Sqrt function for ESQL
Introduces a unary scalar function for square root, which is a thin
wrapper over the Java.Math implementation.
* Fix area for ESQL integration changelog.
* Restore changelog.
* Restore area in changelog.
This PR extends the assumptions we make about database file availability to all database file
names instead of the default ones we host at Elastic. When creating a geo ip processor with
a database name that is not recognized we unilaterally convert the processor to one that
tags documents with a missing database message until the database file requested is
downloaded or provided via the manual configuration route. This allows a pipeline to be
created and for the download service to be started, potentially sourcing the needed files.
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit corrects the following issues with JWT and OIDC `jwkset_path` documentation:
* only https is supported for the JWT realm (OIDC support both https and http)
* JWT realm does not use a file watcher to reload the file every 5 seconds
* simplify "path" to "file name" ..technically it is resolved path, but 99% of the time it will be just
a file name in the config directory and "path" is ambiguous
* remove special mention of using the absolute path in cloud. .. this is an unnecessary implementation
detail and the only setting (of many) that calls out the cloud config directly by absolute path
* ensure the 2 different JWT documentations are the same
* make mention of when the JWT file will be reloaded (it is not backed by the file watcher, only OIDC is)
* Update post-analytics-collection-event.asciidoc
Update API endpoint. Instead of `click`, it should be `search_click`.
Expanded on request body in the API as well.
* Update post-analytics-collection-event.asciidoc
* Update post-analytics-collection-event.asciidoc
* Update post-analytics-collection-event.asciidoc
This change adds the total dense vector count to the output of the indices stats.
This is useful for observability in order to track the number of indexed vectors
in a cluster.
---------
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
In this PR we enable all new data streams to be managed by the data
stream lifecycle by default. This is implemented by adding an empty
`lifecycle: {}` upon new data stream creation.
Opting out is represented by a the `enabled` flag:
```
{
"lifecycle": {
"enabled": false
}
}
```
This change has the following implications on when is an index managed
and by which feature:
| Parent data stream lifecycle| ILM| `prefer_ilm`|Managed by|
|----------------------------|----|----------------|-| | default | yes|
true| ILM| | default | yes| false| data stream lifecycle| |default |
no|true/false|data stream lifecycle| |opt-out or
missing|yes|true/false|ILM| |opt-out or missing|no|true/false|unmanaged|
Data streams that have been created before the data stream lifecycle is
enabled will not have the default lifecycle.
Next steps: - We need to document this when the feature will be GA
(https://github.com/elastic/elasticsearch/issues/97973).
This commit enables concurrent search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection.
We will enable concurrent execution for the query phase in a subsequent commit. While this commit does not introduce parallelism for the query phase, it introduces offloading sequential computation to the newly introduced executor. This is true both for situations where a single slice needs to be searched, as well as scenarios where a specific request does not support concurrency (currently only DFS phase does regardless of the request). Sequential collection is not offloaded only if the request includes aggregations that don't support offloading: composite, nested and cardinality as their post collection method must be executed in the same thread as the collection or we'll trip a lucene assertion that verifies that doc_values are pulled and consumed from the same thread.
## Technical details
This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor.
We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult.
Note that fetch, together with other actions, is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution.
Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a thread pool executor that has the same size as the existing search thread pool but relies on an unbounded queue. This simplifies handling of thread pool queue and rejections. In fact, we'd like to guarantee that the secondary thread pool won't reject, and delegate queuing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search, and rather quickly.
As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation.
Relates to #80693
Relates to #90700
Update the ml and transform reference documentation to provide information regarding the new versioning schemes independent from the product versions.
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Fleet is currently hard coded to set index.codec to best_compression (deflate compression). This is good for most data streams, except for data streams were tsdb is enabled. Ideally Fleet doesn't need to set this setting at all and Elasticsearch's default would be good. But unfortunately this isn't the case. It default to default (lz4 - optimised for speed), which in would mean much higher disk space usage. Ideally the default would be default when synthetic source is enabled and otherwise best_compression. Changing this now, would mean a breaking change.
Instead Fleet like to depend on Elasticsearch's internal component templates. To at least abstract some of the internal details away. The metrics-settings is ok for non tsdb, but there is no component template for tsdb metrics. This PR adds this.
Relates to elastic/kibana#160288
* Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format
Update field-mapping.asciidoc that Epoch format is not supported as dynamic date format
* Update docs/reference/mapping/dynamic/field-mapping.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* [DOC+] ILM min_age interpretation
👋 hiya, team!
From the time y'all helped me write this [ILM Troubleshooting blog](https://www.elastic.co/blog/troubleshooting-elasticsearch-ilm-common-issues-and-fixes) in 2021 and we later ported errors to [this doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/index-lifecycle-error-handling.html) via https://github.com/elastic/elasticsearch/issues/75849, the remaining top-gotcha user's raise is "Common issue 3" that ILM's `min_age` calculates off rollover time fallback index creation time.
This PR cross-pollinates the blog quote into the docs so that Support can link it to users and so it becomes Google-able.
> Common issue 3: min_age calculation clarification
> When working with customers, I have seen confusion about how min_age works. The min_age must increase between subsequent phases. If rollover is used, min_age is calculated off the rollover date. This is because rollover generates a new index and the new index’s creation date is used in the calculation. Otherwise, min_age is calculated off the original index’s creation date.
* Apply suggestions from code review
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* Add tests for query rules
* More tests
* Fix search app tests
* Fix tests
* Add teardown to tests
* Add tests for list search apps call
* Update test in get search application
* Tweak stack trace
* Make response match in test
---------
Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
Here we enable aggregations previously not allowed on fields of type counter.
The decision of enabling such aggregations even if the result is "meaningless"
for counters has been taken to favour TSDB adoption.
Aggregations now allowed, other than the existing ones, include:
* avg
* box plot
* cardinality
* extended stats
* median absolute deviation
* percentile ranks
* percentiles
* stats
* sum
* value count
I included tests for the weighted average and matrix stats aggregations too.
Resolves#97882
Today by default the `SEARCH_COORDINATION` pool is sized at half the
allocated processors, or five if there are more than ten CPUs. Yet, if
we scale up a node to have more than ten CPUs, we probably want to scale
up the number of search coordination threads to match. This commit
removes the limit of five threads.
For CCS searches with ccs_minimize_roundtrips=true, when an error is returned, it is unclear which cluster
caused the problem. This commit adds additional accounting and error information to the search response
for each cluster involved in a cross-cluster search.
The _clusters section of the SearchResponse has a new details section added with an entry for each cluster
(remote and local). It includes status info, shard accounting counters and error information that are added
incrementally as the search happens.
The search on each cluster can be in one of 5 states:
RUNNING
SUCCESSFUL - all shards were successfully searched (successful or skipped)
PARTIAL - some shard searches failed, but at least one succeeded and partial data has been returned
SKIPPED - no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=true
FAILED - no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=false
A new SearchResponse.Cluster object has been added. Each TransportSearchAction.CCSActionListener
(one for each cluster) has a reference to a separate Cluster instance and updates once it gets back
information from its cluster.
The SearchResponse.Clusters object only uses the new Cluster object for CCS minimize_roundtrips=true.
For local-only searches and CCS minimize_roundtrips=false, it uses the current Clusters object as before.
Follow on work will change CCS minimize_roundtrips=false to also use the new Cluster model and update
state in the _cluster/details section.
The Cluster objects are immutable, so a CAS operation is required to swap in new state to the
map of Cluster objects held by the `SearchResponse.Clusters` class. This concurrency model is
a little bit of overkill for the minimize_roundtrips=true use case, but it will be necessary for
supporting minimize_roundtrips=false, since updates there will be done per shard, not per cluster.
* [DOCS] Fix formatting issue in cardinality-aggregation.asciidoc
Fixes a header not rendering properly because of a missing newline.
* Update docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc
* More issues
* More issues
* Update docs/reference/aggregations/metrics/cardinality-aggregation.asciidoc
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
---------
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* [DOCS] Update manual downsampling documentation to use TSDS
* Swap manual and ILM downsampling examples in nav
* Typo
* Update prerequisites based on review feedback
* Warn against deleting the old backing index.
* Clarify counter/gauge results
* Mention that the downsampled type is 'aggregate_metric_double'
This adds the `to_degrees` and `to_radians` functions. It uses the
"convert" function framework because that just felt right - these
convert between radians and degrees after all.
* Stop returning the indices list for GET search app
* Stop returning the indices list for the list search app API
* Stop storing indices list
* Remove indices from system index mapping
* Check for alias in PUT rest tests
* Documentation changes
* Do not check for alias existence since we are already doing get alias
This adds the `query` to the main ESQL task so you can see long running
queries. And adds some docs about it including an example of cancelling
a query.
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
This adds support for numeric fields to `auto_bucket` and adds a new
`floor` function to round numeric down to the nearest integer. That
function is exposed because it's probably useful. I added it in this PR
because `auto_bucket` uses it as an implementation detail as well.
Elasticsearch multi-target syntax for indices allows excluding an index with the "-" sign.
Public docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/api-conventions.html#api-multi-index
This commit expands that functionality to index expressions that include a cluster alias.
For example:
POST logs*,*:logs*,-remote4:*,-remote1*:*/_async_search
Would result in search all remote clusters except for remote4 and remote1, remote11, remote12, remote13, etc..
A singleton wildcard is required in the index position of the `cluster:index`,
to specify that we are excluding the entire cluster. This is useful when a cluster
is down or slow during CCS searches.
Excluding a subset of indexes on a remote cluster is not supported in this commit.
For example, this will throw an error:
POST logs*,*:logs*,-remote4:logs*/_async_search
When ILM is mounting a searchable snapshot in the frozen tier, we chose to ignore the `total_shards_per_node` setting because it's very likely to block shards from being assigned. Usually the `total_shards_per_node` is configured for one of the previous tiers that have usually more nodes than the frozen tier.
This adds a new ES|QL endpoint, `_query`, to replace the now deprecated
`_esql`. The latter is still kept for a while, emitting a deprecation
warning.
Fixes ESQL-1379.
* Index Management now has link to Discover in UI.
* updating screenshot for data streams section
* Update docs/reference/indices/index-mgmt.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* Update docs/reference/indices/index-mgmt.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
To implement this we:
* Cast both arguments to double
* Perform integer and long validation on the double results before casting back to integer or long
* Perform a special case validation for exponent==1
* Any validation failures result in ArithmeticException, which is caught and added to warnings
In #87246 we describe some reasons why it's a good idea to limit the doc
count of a shard, and we started to do so in #94065, so this commit
adjusts the sizing guidance docs to match.
The index and transport versions are low level details of how a node
behaves in a cluster. They were recently added to the main endpoint
response, but they are too low level and should be moved to another
endpoint TBD.
This commit removes those versions from the main endpoint response. Due
to the fact lucene version is now derived from index version, this
commit also adds an explicit lucene version member to the main response.
The completion_time is set as the start_time (already present) plus the 'took'
time that is set in the SearchResponse object and only if the isRunning status == false
since took is set even for in-progress searches.
We use the 'took' field because it is based on relative time, not absolute wall clock time
which can go backwards due to NTP issues. See the comments in TransportSearchAction about
the SearchTimeProvider for details.
Closes#88640
Before we used to track max_score in collapse when requested (track_scores=true)
or when there is no sort in collapse (see PR#27122). But this feature
was lost through refactoring and changes.
This PR restores this feature.
Closes#97653
Add mask_token field to fill_mask of _ml/trained_models.
This change will enable users and Kibana to get the particular mask tokens needed for deployed models by adding a mask_token field to the GET _ml/trained_models API, as an enhancement to support kibana#159577.
There are situations in which the terminate_after functionality causes
the collection to keep on going although there is nothing to collect,
with the only goal of incrementing the counter of collected docs and
eventually early terminating which sets the `terminated_early` flag
in the search response to true.
When docs collection early terminates, we should rather honor the
corresponding `CollectionTerminatedException` that is thrown, and
adjust expectations around the fact that `terminate_after` affects
actual collection of documents, meaning that it can't be honored if
the threshold has not been reached by the team the collection early
terminates for other reasons.
This commit adjust the QueryPhaseCollector behavior to do that, which
allows for some additional simplifications.
Closes#97269
Today the `current_node` parameter is given in several sample requests
illustrating how to explain an unassigned shard using the cluster
allocation explain API. This doesn't make sense, an unassigned shard has
no `current_node`. This commit removes the misleading parameter in these
cases.
Added a clusterAlias to the Painless execute Request object, so that index
expressions in the request of the form "myremote:myindex" will be parsed to
set clusterAlias to "myremote" and the index to "myindex".
If clusterAlias is null, then it is executed against a shard on the local cluster, as before.
If clusterAlias is non-null, then the SingleShardTransportAction is sent to the remote cluster,
where it will run the full request (doing remote coordination). Note that the new clusterAlias
field is not Writeable so that when it is sent to the remote cluster it will only see the index
name, not the clusterAlias (which it wouldn't know how to handle correctly).
Added PainlessExecuteIT test that tests cross-cluster calls
Updated painless-execute-script end user docs to indicate support for cross-cluster executions
Today we document that tasks may not react to cancellations immediately,
but in practice it's surprising to users and kind of a bug if they run
for too long after being cancelled. This commit adds a little extra
detail about the information to collect to troubleshoot such a
situation.
Currently the prefix size of the _terms_enum endpoint are not limited in size.
Since they run against a keyword field and build automata, this can lead to high memory
consumption and the danger of running OOM. This change check the size of the prefix
early in the rest request and throw a validation error in case it exceeds
IndexWriter.MAX_TERM_LENGTH, which is the same limit we apply to the length of
keyword field values anyway, so this comes at no loss in functionality.
Closes#96572
Discovery, like cluster membership, can also be affected by network-like
issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this
commit duplicates the troubleshooting info across both places.
- Adds the TOC to the Elasticsearch docs landing page. Removes the right sidebar from the landing page.
- Removes the "View all Elastic docs" link from the bottom of the landing page
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Introduces a unary scalar function for base 10 log, which is a thin
wrapper over the Java.Math implementation
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>