Adds a new API that allows a user to reset
an anomaly detection job.
To use the API do:
```
POST _ml/anomaly_detectors/<job_id>_reset
```
The API removes all data associated to the job.
In particular, it deletes model state, results and stats.
However, job notifications and user annotations are not removed.
Also, the API can be called asynchronously by setting the parameter
`wait_for_completion` to `false` (defaults to `true`). When run
that way the API returns the task id for further monitoring.
In order to prevent the job from opening while it is resetting,
a new job field has been added called `blocked`. It is an object
that contains a `reason` and the `task_id`. `reason` can take
a value from ["delete", "reset", "revert"] as all these
operations should block the job from opening. The `task_id` is also
included in order to allow tracking the task if necessary.
Finally, this commit also sets the `blocked` field when
the revert snapshot API is called as a job should not be opened
while it is reverted to a different model snapshot.
It is useful to know the following information when reading datafeed stats:
- Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
- Has the datafeed processed all past data available at the time of starting.
This object is only available if the datafeed task has been created.
It has the form:
```
"running_state": {
"is_real_time": <boolean>,
"look_back_finished": <boolean>
}
```
Changes:
* Removes a reference to the
`-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode` JVM option. This
option is no longer supported.
* Combines `Xms/Xmx` recommendations for compressed oops.
Closes#71644.
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
Adds a new keep_values gap policy that works like skip, except if the metric
calculated on an empty bucket provides a non-null non-NaN value, this value is
used for the bucket.
Fixes#27377
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Changes:
* Combines the `Document counts are approximate` and `Calculating document count
error` sections.
* Rewrites the section to include `sum_other_doc_count` and
`doc_count_error_upper_bound` for easier on-page (ctrl+f) searching.
Closes#73200
Improve the error message when inconsistent mappings cause doc value formatting errors. For example, trying to format a binary encoded IP address as a UTF8 string often fails with something unexpected, like `ArrayIndexOutOfBounds`. This change catches that and wraps it with a message suggesting the user check their mappings. Also gets rid of anonymous instances for doc value formatters, which made it hard to see what format was failing to be applied.
In #55805, we added a setting to allow single data node clusters to
respect the high watermark. In #73733 we added the related deprecations.
This commit ensures the only valid value for the setting is true and
adds deprecations if the setting is set. The setting will be removed
in a future release.
Co-authored-by: David Turner <david.turner@elastic.co>
This commit adds a short note to the docs on repository backups
indicating that the repository must not be modified while registered, so
that a restore from a repository backup must complete before
registration.
Relates #73730
This adds a new pipeline aggregation for calculating Kolmogorov–Smirnov test for a given sample and buckets path.
For now, the buckets path resolution needs to be `_count`. But, this may be relaxed in the future.
It accepts a parameter `fractions` that indicates the distribution of documents from some other pre-calculated sample.
This particular version of the K-S test is Two-sample, meaning, it calculates if the `fractions` and the distribution of `_count` values in the buckets_path are taken from the same distribution.
This in combination with the hypothesis alternatives (`less`, `greater`, `two_sided`) and sampling logic (`upper_tail`, `lower_tail`, `uniform`) allow for flexibility and usefulness when comparing two samples and determining the likelihood of them being from the same overall distribution.
Usage:
```
POST correlate_latency/_search?size=0&filter_path=aggregations
{
"aggs": {
"buckets": {
"terms": { <1>
"field": "version",
"size": 2
},
"aggs": {
"latency_ranges": {
"range": { <2>
"field": "latency",
"ranges": [
{ "to": 0.0 },
{ "from": 0, "to": 105 },
{ "from": 105, "to": 225 },
{ "from": 225, "to": 445 },
{ "from": 445, "to": 665 },
{ "from": 665, "to": 885 },
{ "from": 885, "to": 1115 },
{ "from": 1115, "to": 1335 },
{ "from": 1335, "to": 1555 },
{ "from": 1555, "to": 1775 },
{ "from": 1775 }
]
}
},
"ks_test": { <3>
"bucket_count_ks_test": {
"buckets_path": "latency_ranges>_count",
"alternative": ["less", "greater", "two_sided"]
}
}
}
}
}
}
```
This commit adds the ability to specify exclusion patterns in Auto-Follow patterns. This allows excluding indices that match any of the inclusion patterns and also match some of the exclusion patterns giving more fine grained control in scenarios where this is important.
Related #67686
The value of `*.ssl.client_authentication` is `required` for
everything except `xpack.security.http.ssl.client_authentication`, for
which is it `none`.
The doc template for this setting was configured to have a default
value, and allow an override. However, the default was set to `none`
when it should have been `required`.
The override for `http` was correctly set to `none` (but that didn't
really do anything, since that was the same as the default).
This commit changes the default to `required`, which matches the code
(see `XPackSettings.CLIENT_AUTH_DEFAULT`), and leaves the override for
http as `none` (see `XPackSettings.HTTP_CLIENT_AUTH_DEFAULT`).
* Add new thread pool for critical operations
* Split critical thread pool into read and write
* Add POJO to hold thread pool names
* Add tests for critical thread pools
* Add thread pools to data streams
* Update settings for security plugin
* Retrieve ExecutorSelector from SystemIndices where possible
* Use a singleton ExecutorSelector
* [DOCS] Add retriving from flattened fields
* Clarify sub-field syntax
* Moving sub-field retrieval to flattened field docs
* Remove full example and de-emphasize runtime fields
* Remove extraneous sample tag
With
230b860d95,
the `elastic/tap/elasticsearch-oss` tap was removed from Homebrew. This
removes outdated references to the tap from our docs.
It also notes that Homebrew installs the latest version of Elasticsearch.
Changes:
* Revises the size your shards guide to use a 50GB shard guideline. This better aligns with our default in the ILM policy UI.
* Updates the language to indicate that the 50GB shard guideline is not a hard limit. Larger shards may work depending on the network and use case.
Reverts some changes added in #71367.
Changes:
* Reuses the same `aliases` object properties in the following API docs:
* Clone index API
* Create index API
* Put component template API
* Put legacy index template API
* Put index template API
* Rollover index API
* Shrink index API
* Simulate template API
* Split index API
* Updates the `aliases` object properties for the simulate index API docs.
Closes#73044
Recent JDK releases have disabled TLS v1.0 and TLS v1.1 by default
See
- https://java.com/en/jre-jdk-cryptoroadmap.html
- https://bugs.openjdk.java.net/browse/JDK-8202343
This change adds documentation clarifying which TLS versions are
supported on which JDKs (in general terms, rather than specific builds)
and how to change the configuration if necessary.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Relates to #70755.
The main changes of this PR are:
Add an optional _meta field to ILM policy.
Add some test code about the change.
Update the doc of Create or update lifecycle policy API.
Changes:
* Updates the write index snippet to use data streams
* Notes data streams aliases don't set an implicit write stream, even if the alias points to one data stream.
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.
The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.
It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.
To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.
If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.
Closeselastic/ml-cpp#1724
Today when upgrading to the next major version we have a so-called
_major version barrier_: once the cluster comprises nodes of the new
major version then nodes of the previous major version are prevented
from joining the cluster. This means we can be certain that
`clusterState.nodes().getMinNodeVersion().major` will never decrease, so
we can implement upgrade logic that relies on the cluster remaining in
its wholly-upgraded state.
This commit generalises this behaviour to apply to all upgrades, so that
we can be certain that `clusterState.nodes().getMinNodeVersion()` will
never decrease in a running cluster.
Closes#72911
This allows indexing documents into a data stream alias.
The ingestion is that forwarded to the write index of the data stream
that is marked as write data stream.
The `is_write_index` parameter can be used to indicate what the write data stream is,
when updating / adding a data steam alias.
Relates to #66163
* [DOCS] Create a new page for dissect in scripting docs
* Expanding a bit more
* Adding a section for using dissect patterns
* Adding tests
* Fix test cases and other edits
* [DOCS] Moving grok to its own scripting page
* Adding examples
* Updating cross link for grok page
* Adds same runtime field in a search request for #73262
* Clarify titles and shift navigation
* Incorporating review feedback
* Updating cross-link to Painless
* [DOCS] Expand information on when to use a runtime field without a script
* Reworking information based on review feedback
* Clarify case where doc_values are disabled
* A few minor changes from review feedback
These settings were deprecated in 7.13+ in #72835 and are now removed by this commit.
This commit also ensures that the settings are removed from index metadata when the metadata is
loaded. The reason for this is that if we allow the settings to remain (because they are not
technically "invalid"), then the index will not be able to be allocated, because the
FilterAllocationDecider will be looking for nodes with the _tier attribute.
Changes:
* Expands the `aliases` parameter for the create index API to better document
supported properties.
* Reuses `aliases` parameter in the following API docs:
* Clone index API
* Shrink index API
* Split index API
Updates an outdated reference to empty `data_stream` objects. The create index
template API's `data_stream` parameter now supports the `hidden` property.
Changes:
* Reuses and reorders the index template API's body parameters in the simulate template API docs.
* Replaces several includes with a shorter xref.
* Reformats a sidebar on naming collisions with built-in index templates.
Updates the exists API docs to better reflect its support of data streams and
aliases.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This documents how to configure the proxy support for ODBC.
It also removes the documentation of the connection string values, these
are now all covered by the GUI settings.
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Fixed autoscaling docs to no longer call partially mounted indices or
shards for frozen indices/shards, now uses partially mounted indices or
shards.
Closes#73132
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This change adds support for using `search_after` with field collapsing. When
using these in conjunction, the same field must be used for both sorting and
field collapsing. This helps keep the behavior simple and predictable.
Otherwise it would be possible for a group to appear on multiple pages of
results.
Currently search after is handled directly in `CollapsingTopDocsCollector`. As
a follow-up, we could generalize the logic and move support to the Lucene
grouping framework.
Closes#53115.
Today the docs indicate that restoring a snapshot with
`include_global_state` set will merge the ingest pipelines, ILM
policies, settings etc in the snapshot with those already in the
cluster. This isn't the case, we simply replace all the things. This
commit corrects the docs.
The get alias api should take into account the aliases parameter when
returning aliases that refer to data streams and don't return entries
for data streams that don't have any aliases pointing to it.
Relates to #66163
Adds new snapshot meta pool that is used to speed up the get snapshots API
by making `SnapshotInfo` load in parallel. Also use this pool to load
`RepositoryData`.
A follow-up to this would expand the use of this pool to the snapshot status
API and make it run in parallel as well.
The current search API documentation doesn't include any examples of query
parameter usage.
This updates the docs to include a simple syntax example using the `from` and
`size` query parameters.
Changes:
* Removes an error in the create SLM policy API's `schedule` parameter
def. `schedule` is not used to delete expired snapshots.
* Updates the `expire_after` parameter def to mention the
`slm.retention_schedule` cluster setting.
This commit adds a `cancelled` flag to each cancellable task in the
response to the list tasks API, allowing users to see that a task has
been properly cancelled and will complete as soon as possible.
Closes#72907
If a node is partitioned away from the rest of the cluster then the
`ClusterFormationFailureHelper` periodically reports that it cannot
discover the expected collection of nodes, but does not indicate why. To
prove it's a connectivity problem, users must today restart the node
with `DEBUG` logging on `org.elasticsearch.discovery.PeerFinder` to see
further details.
With this commit we log messages at `WARN` level if the node remains
disconnected for longer than a configurable timeout, which defaults to 5
minutes.
Relates #72968
Adds an optional parameter to the _terms_enum request designed to allow paging.
The last term from a previous result can be passed as the search_after parameter to a subsequent request, meaning only terms after the given term (but still matching the provided string prefix) are returned
Relates to #72910
* wip
* Service Accounts - add beta documentation
* consistent names
* fix test
* Update service accounts overview and token creation files.
* Rename get service tokens to get service credentials
* fix tests
* Changes for create and get service tokens.
* Changes for get token creds, delete token, clear token cache, and token auth.
* add manage_service_account privilege to list
* List service accounts APIs
* Move xpack setting to Security API page, plus other cleanup.
* Shorten secret tokens in examples, add cross links, plus other cleanup.
* Clarifying parameter descriptions.
* Clarify language for authenticating with a token.
* Tweaks
* Typo fix
* Adding redirects to work around CI build checks
* Revert "Adding redirects to work around CI build checks"
This reverts commit 20a1b53591.
* Remove redirects that were implemented to satisfy CI checks in master branch
* Move note about not supporting basic auth
* Clarify what service accounts are specifically for
* Apply suggestions from code review
Co-authored-by: Tim Vernum <tim@adjective.org>
* Addressing review feedback
* tweak
* Improve doc tests
* fix test
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
* Deprecate single-tier allocation filtering settings
`(index|cluster).routing.allocation.(include|exclude|require)._tier` settings are now deprecated in
favor of using `index.routing.allocation.include._tier_preference`.
* Update deprecation message
We currently don't support `copy_to` for fields that take the form of objects
(e.g. `date_range` or certain kinds of `geo_point` variants). The current
problem with objects is that when DocumentParser parses anything other than
single values, it potentially advances the underlying parser past the value that
we would need to stay on for parsing the value again. While we might want to
support this in the future, for now this PR enhances the otherwise confusing
MapperParsingException with something more helpful and adds a short note in the
documentation about this restriction.
Closes#49344
Aliases to data streams can be defined via the existing update aliases api.
Aliases can either only refer to data streams or to indices (not both).
Also the existing get aliases api has been modified to support returning
aliases that refer to data streams.
Aliases for data streams are stored separately from data streams and
and refer to data streams by name and not to the backing indices of
a data stream. This means that when backing indices are added or removed
from a data stream that then the data stream alias doesn't need to be
updated.
The authorization model for aliases that refer to data streams is the
same as for aliases the refer to indices. In security privileges can
be defined on aliases, indices and data streams. When a privilege is
granted on an alias then access is also granted on the indices that
an alias refers to (irregardless whether privileges are granted or denied
on the actual indices). The same will apply for aliases that refer
to data streams. See for more details:
https://github.com/elastic/elasticsearch/issues/66163#issuecomment-824709767
Relates to #66163
Adds some extra debugging information to make it clear that you are
running `significant_text`. Also adds some using timing information
around the `_source` fetch and the `terms` accumulation. This lets you
calculate a third useful timing number: the analysis time. It is
`collect_ns - fetch_ns - accumulation_ns`.
This also adds a half dozen extra REST tests to get a *fairly*
comprehensive set of the operations this supports. It doesn't cover all
of the significance heuristic parsing, but its certainly much better
than what we had.
This commit adds a new pipeline aggregation that allows correlation within the aggregation frame work in bucketed values.
The initial function is a `count_correlation` function. The purpose of which is to correlate the count in a consistent number of buckets with a pre calculated indicator. The indicator and the aggregated buckets should related to the same metrics with in documents.
Example for correlating terms within a `service.version.keyword` with latency percentiles. The percentiles and provided correlation indicator both refer to the same source data where the indicator was previously calculated.:
```
GET apm-7.12.0-transaction-generated/_search
{
"size": 0,
"aggs": {
"field_terms": {
"terms": {
"field": "service.version.keyword",
"size": 20
},
"aggs": {
"latency_range": {
"range": {
"field": "transaction.duration.us",
"ranges": [<snip>],
"keyed": true
}
},
"correlation": {
"bucket_correlation": {
"buckets_path": "latency_range>_count",
"count_correlation": {
"indicator": {
"expectations": [<snip>],
"doc_count": 20000
}
}
}
}
}
}
}
}
```
Today we mention Metricbeat's `scope` parameter but offer no guidance
about how it should be used. This commit adds guidance to use `scope:
cluster`, especially on clusters with dedicated master-eligible nodes.
Due to problems discovered in #72572 we have to disable geoip downloader for now. We use ingest.geoip.downloader.enabled.default as feature flag.
This change also reverts changes to docs.
This commit removes the bootstrap.system_call_filter setting, as
starting in Elasticsearch 8.0.0 we are going to require that system call
filters be installed and that this is not user configurable. Note that
while we force bootstrap to attempt to install system call filters, we
only enforce that they are installed via a bootstrap check in production
environments. We can consider changing this behavior, but leave that for
future consideration and thus a potential follow-up change.
We are going to require system call filters. This commit is the first
step in that journey, which is to deprecate the setting that allows
disabling system call filters.
The docs for the `filter` agg seemed to suggest that it was the
preferred way to filter results for aggs but its really mostly for when
you need to filter things under another bucketing agg.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
New api designed for use by apps like Kibana for auto-complete use cases.
A search string is supplied which is used as prefix for matching terms found in a given field in the index.
Supported field types are keyword, constant_keyword and flattened.
A timeout can limit the amount of time spent looking for matches (default 1s) and an `index_filter` query can limit indices e.g. those in the hot or warm tier by querying the `_tier` field
Closes#59137
Watcher uses a connection pool for outgoing HTTP traffic, which means
that some HTTP connections may live for a long time, possibly in an idle
state. Such connections may be silently torn down by a remote device, so
that when we re-use them we encounter a `Connection reset` or similar
error.
This commit introduces a setting allowing users to set a finite expiry
time on these connections, and also enables TCP keepalives on them by
default so that a remote teardown will be actively detected sooner.
Closes#52997
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
* Remove frozen tier restriction for ESS
* Remove section from 'Use ES for time series data'
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* [DOCS] Add field extraction use cases to scripting docs
* Adding file
* Remove extra space
* Add dissect pattern to split and retrieve data
* Fix list spacing
* Incorporating review feedback
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.
If a node does not have a view into its native memory, we ignore it for assignment.
This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
With the introduction of BKD-based geo shape indexing in #32039, the prefix tree indexing method has
been deprecated. From 8.0.0, it will not be allowed to create new mappings using deprecated parameters.
* [DOCS] Document how to migrate to node roles from node attrs. Closes#65855
* [DOCS] Incorporated review comments
* Update docs/reference/data-management/migrate-index-allocation-filters.asciidoc
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
Previously, the ResetFeatureStateStatus object captured its status in a
String, which meant that if we wanted to know if something succeeded or
failed, we'd have to parse information out of the string. This isn't a
good way of doing things.
I've introduced a SUCCESS/FAILURE enum for status constants, and added a
check for failures in the transport action. We return a 207 if some but not all
reset actions fail, and for every failure, we also return information about the
exception or error that caused it.
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
We rely on the repository implementation correctly handling the case where a
write is aborted before it completes. This is not guaranteed for third-party
repositories.
This commit adds a rare action during analysis which aborts the write
just before it completes and verifies that the target blob is not found
by any node.
add support for the stats and top metrics aggregation in transform. With this change it became
easier to add more multi value aggregations to transform
Limitations:
- only the 1st element of top_metrics gets consumed by transform[*].
- all values of stats will be mapped to double if mapping deduction is used, including count,
sum, min, max
fixes#52236
relates #51925
When doing a rolling restart we recommend disabling shard allocation to
avoid unnecessary recoveries. However, this advise is unnecessary or
even harmful when restarting nodes that do not carry any data like a
pure ML node.
Today the only example of calling the cluster allocation explain API above the
fold is the bare `GET /_cluster/allocation/explain` which kind of works but is
not usually what the user wants. This commit changes the docs so that we open
with an example showing how we usually expect it to be called. This will make
it clearer that you should normally specify exactly for which shard you want an
explanation. It also tidies up a few other wrinkles in these docs.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Updates a cat test snippet to always return by index name in asc order
* Removes several leading slashes
* Reduces length of several snippet delimiters
Closes https://github.com/elastic/elasticsearch/issues/71683
This adds a new `match_only_text` field, which indexes the same data as a `text`
field that has `index_options: docs` and `norms: false` and uses the `_source`
for positional queries like `match_phrase`. Unlike `text`, this field doesn't
support scoring.
This commit adds support for system data streams and also the first use
of a system data stream with the fleet action results data stream. A
system data stream is one that is used to store system data that users
should not interact with directly. Elasticsearch will manage these data
streams. REST API access is available for external system data streams
so that other stack components can store system data within a system
data stream. System data streams will not use the system index read and
write threadpools.
* [DOCS] Focus retrieving selected fields on fields parameter
* Incorporating changes from reviews
* Adding clarifications from review feedback
* Slight wording revisions.
* Clarify language around format parameter and move text out of callout.
Currently when the fleet global checkpoints API returns immediately if
the index is not ready or shards are not ready. This commit modifies the
API to wait on the index and primary shards active up until the timeout
period.
Related to #71449.
This commit revives the documentation of the "Clear Cache" and
"Shard Stats" APIs of Searchable Snapshots that was removed
in #62217. This is a partial revert of the commit b545c55 with
some light wording changes.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Dedicated frozen nodes can survive less headroom than other data nodes.
This commits introduces a separate flood stage threshold for frozen as
well as an accompanying max_headroom setting that caps the amount of
free space necessary on frozen.
Relates #71844
Removes the experimental status for the frozen tier / shared_cache searchable snapshots for the 7.13 release.
Also adapts docs that URL repositories are now supported in 7.13 for searchable snapshots.
Changes:
* Refactors the "Getting Started" content down to one page.
* Refactors the README to reduce duplicated content and better mirror
Kibana's.
* Focuses the quick start on time series data, including data streams
and runtime fields.
* Streamlines self-managed install instructions to Docker.
Co-authored-by: debadair <debadair@elastic.co>
In update by query requests where max_docs < size and conflicts=proceed
we weren't using the remaining documents from the scroll response in
cases where there were conflicts and in the first bulk request the
successful updates < max_docs. This commit address that problem and
use the remaining documents from the scroll response instead of
requesting a new page.
Closes#63671
The frozen tier partially downloads shards only. This commit
introduces an autoscaling decider that scales the total storage
on the tier according to a configurable percentage relative to
the total data set size.
This commit adds the ability to define an index-time geo_point field
with a script parameter, allowing you to calculate points from other
values within the indexed document.
As we started thinking about applying on_script_error to runtime fields, to handle script errors at search time, we would like to use the same parameter that was recently introduced for indexed fields. We decided that continue or fail gives a better indication of the behaviour compared to the current ignore or reject which is too specific to indexing documents.
This commit applies such rename.
This change exposes the newly introduced parameter `dynamic_templates`
in ingest. This parameter can be set by a set processor or a script processor.
Relates #69948
This commit adds some per-index statistics to the `SnapshotInfo` blob:
- number of shards
- total size in bytes
- maximum number of segments per shard
It also exposes these statistics in the get snapshot API.
Allow direct access to a dense_vector' values in script
through the following functions:
- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude
Closes#51964
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.
Relates #71042 and #34021
This PR adds documentation for GeoIPv2 auto-update feature.
It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting.
Relates to #68920
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Currently the `fields` API fetches the root flattened field and returns it in a
structured way in the response. In addition this change makes it possible to
directly query subfields. However, requesting flattened subfields via wildcard
patterns is not possible.
Closes#70605
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.
This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.
Addresses #41106.
Fleet server needs an API to access up to date global checkpoints for
indices. Additionally, it requires a mode of operation when fleet can
provide its current knowledge about the global checkpoints and poll for
advancements. This commit introduces this API in the fleet plugin.
This commit allows you to set 'script' and 'on_script_error' parameters
on date field mappers, meaning that runtime date fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `parent_join` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
This commit allows you to set 'script' and 'on_script_error' parameters
on IP field mappers, meaning that runtime IP fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
If enabled, the `delete_searchable_snapshot` option will attempt to delete the
index snapshot generated in any previous phase, for the purpose of mounting the
index as a searchable snapshot.
This shrinks a runtime field definition so that it fits on the screen
without scrolling. It also converts the doc into a test so we can be
sure it continues to work.
Relates to #69291
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `date_nanos` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
We have recently introduced the ability to associate an indexed field with a script. This commit updates the existing mappings stats to output stats about the script, similar to what we already do for runtime fields.
Today the response to `GET _cluster/state` does not include the roles of
the nodes in the cluster. In the past this made sense, roles were
relatively unchanging things that could be determined from elsewhere.
These days we have an increasingly rich collection of roles, with
nontrivial BWC implications, so it is important for debugging to be able
to see the specific roles as viewed by the master. This commit adds the
role names to the cluster state API output.
Relates #71385
Runtime fields are much more flexible than `script_fields` because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `boolean` field in a script to a runtime
field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do *something*.
Relates to #69291
This commit adds a deprecation note to the multiple data paths doc. It also removes mention of multiple paths support in the setup settings table.
relates #71205
This adds a "note" on the docs for the script query pointing folks to
runtime fields because they are more flexible. It also translates the
request example into runtime fields.
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Add warning admonition for removing runtime fields.
* Add cross-link to runtime fields.
* Expanding examples for runtime fields in a search request.
* Clarifying language and simplifying response tests.
We previously allowed but deprecated the ability for the shared cache to
be positively sized on nodes without the frozen role. This is because we
only allocate shared_cache searchable snapshots to nodes with the frozen
role. This commit completes our intention to deprecate/remove this
ability.
This PR sets the default value of `action.destructive_requires_name`
to `true.` Fixes#61074. Additionally, we set this value explicitly in
test classes that rely on wildcard deletions to clear test state.
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
This commit allows for composite aggregations in datafeeds.
Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations.
The restrictions for this support are as follows:
- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).
Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary.
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.