Adds an optional parameter to the _terms_enum request designed to allow paging.
The last term from a previous result can be passed as the search_after parameter to a subsequent request, meaning only terms after the given term (but still matching the provided string prefix) are returned
Relates to #72910
* wip
* Service Accounts - add beta documentation
* consistent names
* fix test
* Update service accounts overview and token creation files.
* Rename get service tokens to get service credentials
* fix tests
* Changes for create and get service tokens.
* Changes for get token creds, delete token, clear token cache, and token auth.
* add manage_service_account privilege to list
* List service accounts APIs
* Move xpack setting to Security API page, plus other cleanup.
* Shorten secret tokens in examples, add cross links, plus other cleanup.
* Clarifying parameter descriptions.
* Clarify language for authenticating with a token.
* Tweaks
* Typo fix
* Adding redirects to work around CI build checks
* Revert "Adding redirects to work around CI build checks"
This reverts commit 20a1b53591.
* Remove redirects that were implemented to satisfy CI checks in master branch
* Move note about not supporting basic auth
* Clarify what service accounts are specifically for
* Apply suggestions from code review
Co-authored-by: Tim Vernum <tim@adjective.org>
* Addressing review feedback
* tweak
* Improve doc tests
* fix test
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
* Deprecate single-tier allocation filtering settings
`(index|cluster).routing.allocation.(include|exclude|require)._tier` settings are now deprecated in
favor of using `index.routing.allocation.include._tier_preference`.
* Update deprecation message
We currently don't support `copy_to` for fields that take the form of objects
(e.g. `date_range` or certain kinds of `geo_point` variants). The current
problem with objects is that when DocumentParser parses anything other than
single values, it potentially advances the underlying parser past the value that
we would need to stay on for parsing the value again. While we might want to
support this in the future, for now this PR enhances the otherwise confusing
MapperParsingException with something more helpful and adds a short note in the
documentation about this restriction.
Closes#49344
Aliases to data streams can be defined via the existing update aliases api.
Aliases can either only refer to data streams or to indices (not both).
Also the existing get aliases api has been modified to support returning
aliases that refer to data streams.
Aliases for data streams are stored separately from data streams and
and refer to data streams by name and not to the backing indices of
a data stream. This means that when backing indices are added or removed
from a data stream that then the data stream alias doesn't need to be
updated.
The authorization model for aliases that refer to data streams is the
same as for aliases the refer to indices. In security privileges can
be defined on aliases, indices and data streams. When a privilege is
granted on an alias then access is also granted on the indices that
an alias refers to (irregardless whether privileges are granted or denied
on the actual indices). The same will apply for aliases that refer
to data streams. See for more details:
https://github.com/elastic/elasticsearch/issues/66163#issuecomment-824709767
Relates to #66163
Adds some extra debugging information to make it clear that you are
running `significant_text`. Also adds some using timing information
around the `_source` fetch and the `terms` accumulation. This lets you
calculate a third useful timing number: the analysis time. It is
`collect_ns - fetch_ns - accumulation_ns`.
This also adds a half dozen extra REST tests to get a *fairly*
comprehensive set of the operations this supports. It doesn't cover all
of the significance heuristic parsing, but its certainly much better
than what we had.
This commit adds a new pipeline aggregation that allows correlation within the aggregation frame work in bucketed values.
The initial function is a `count_correlation` function. The purpose of which is to correlate the count in a consistent number of buckets with a pre calculated indicator. The indicator and the aggregated buckets should related to the same metrics with in documents.
Example for correlating terms within a `service.version.keyword` with latency percentiles. The percentiles and provided correlation indicator both refer to the same source data where the indicator was previously calculated.:
```
GET apm-7.12.0-transaction-generated/_search
{
"size": 0,
"aggs": {
"field_terms": {
"terms": {
"field": "service.version.keyword",
"size": 20
},
"aggs": {
"latency_range": {
"range": {
"field": "transaction.duration.us",
"ranges": [<snip>],
"keyed": true
}
},
"correlation": {
"bucket_correlation": {
"buckets_path": "latency_range>_count",
"count_correlation": {
"indicator": {
"expectations": [<snip>],
"doc_count": 20000
}
}
}
}
}
}
}
}
```
Today we mention Metricbeat's `scope` parameter but offer no guidance
about how it should be used. This commit adds guidance to use `scope:
cluster`, especially on clusters with dedicated master-eligible nodes.
Due to problems discovered in #72572 we have to disable geoip downloader for now. We use ingest.geoip.downloader.enabled.default as feature flag.
This change also reverts changes to docs.
This commit removes the bootstrap.system_call_filter setting, as
starting in Elasticsearch 8.0.0 we are going to require that system call
filters be installed and that this is not user configurable. Note that
while we force bootstrap to attempt to install system call filters, we
only enforce that they are installed via a bootstrap check in production
environments. We can consider changing this behavior, but leave that for
future consideration and thus a potential follow-up change.
We are going to require system call filters. This commit is the first
step in that journey, which is to deprecate the setting that allows
disabling system call filters.
The docs for the `filter` agg seemed to suggest that it was the
preferred way to filter results for aggs but its really mostly for when
you need to filter things under another bucketing agg.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
New api designed for use by apps like Kibana for auto-complete use cases.
A search string is supplied which is used as prefix for matching terms found in a given field in the index.
Supported field types are keyword, constant_keyword and flattened.
A timeout can limit the amount of time spent looking for matches (default 1s) and an `index_filter` query can limit indices e.g. those in the hot or warm tier by querying the `_tier` field
Closes#59137
Watcher uses a connection pool for outgoing HTTP traffic, which means
that some HTTP connections may live for a long time, possibly in an idle
state. Such connections may be silently torn down by a remote device, so
that when we re-use them we encounter a `Connection reset` or similar
error.
This commit introduces a setting allowing users to set a finite expiry
time on these connections, and also enables TCP keepalives on them by
default so that a remote teardown will be actively detected sooner.
Closes#52997
Changes:
* Renames 'full copy searchable snapshot' to 'fully mounted index.'
* Renames 'shared cache searchable snapshot' to 'partially mounted index.'
* Removes some unneeded cache setup instructions for the frozen tier. We added a default cache size with #71844.
* Remove frozen tier restriction for ESS
* Remove section from 'Use ES for time series data'
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* [DOCS] Add field extraction use cases to scripting docs
* Adding file
* Remove extra space
* Add dissect pattern to split and retrieve data
* Fix list spacing
* Incorporating review feedback
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.
If a node does not have a view into its native memory, we ignore it for assignment.
This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
With the introduction of BKD-based geo shape indexing in #32039, the prefix tree indexing method has
been deprecated. From 8.0.0, it will not be allowed to create new mappings using deprecated parameters.
* [DOCS] Document how to migrate to node roles from node attrs. Closes#65855
* [DOCS] Incorporated review comments
* Update docs/reference/data-management/migrate-index-allocation-filters.asciidoc
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
Previously, the ResetFeatureStateStatus object captured its status in a
String, which meant that if we wanted to know if something succeeded or
failed, we'd have to parse information out of the string. This isn't a
good way of doing things.
I've introduced a SUCCESS/FAILURE enum for status constants, and added a
check for failures in the transport action. We return a 207 if some but not all
reset actions fail, and for every failure, we also return information about the
exception or error that caused it.
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
We rely on the repository implementation correctly handling the case where a
write is aborted before it completes. This is not guaranteed for third-party
repositories.
This commit adds a rare action during analysis which aborts the write
just before it completes and verifies that the target blob is not found
by any node.
add support for the stats and top metrics aggregation in transform. With this change it became
easier to add more multi value aggregations to transform
Limitations:
- only the 1st element of top_metrics gets consumed by transform[*].
- all values of stats will be mapped to double if mapping deduction is used, including count,
sum, min, max
fixes#52236
relates #51925
When doing a rolling restart we recommend disabling shard allocation to
avoid unnecessary recoveries. However, this advise is unnecessary or
even harmful when restarting nodes that do not carry any data like a
pure ML node.
Today the only example of calling the cluster allocation explain API above the
fold is the bare `GET /_cluster/allocation/explain` which kind of works but is
not usually what the user wants. This commit changes the docs so that we open
with an example showing how we usually expect it to be called. This will make
it clearer that you should normally specify exactly for which shard you want an
explanation. It also tidies up a few other wrinkles in these docs.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Updates a cat test snippet to always return by index name in asc order
* Removes several leading slashes
* Reduces length of several snippet delimiters
Closes https://github.com/elastic/elasticsearch/issues/71683
This adds a new `match_only_text` field, which indexes the same data as a `text`
field that has `index_options: docs` and `norms: false` and uses the `_source`
for positional queries like `match_phrase`. Unlike `text`, this field doesn't
support scoring.
This commit adds support for system data streams and also the first use
of a system data stream with the fleet action results data stream. A
system data stream is one that is used to store system data that users
should not interact with directly. Elasticsearch will manage these data
streams. REST API access is available for external system data streams
so that other stack components can store system data within a system
data stream. System data streams will not use the system index read and
write threadpools.
* [DOCS] Focus retrieving selected fields on fields parameter
* Incorporating changes from reviews
* Adding clarifications from review feedback
* Slight wording revisions.
* Clarify language around format parameter and move text out of callout.
Currently when the fleet global checkpoints API returns immediately if
the index is not ready or shards are not ready. This commit modifies the
API to wait on the index and primary shards active up until the timeout
period.
Related to #71449.
This commit revives the documentation of the "Clear Cache" and
"Shard Stats" APIs of Searchable Snapshots that was removed
in #62217. This is a partial revert of the commit b545c55 with
some light wording changes.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Dedicated frozen nodes can survive less headroom than other data nodes.
This commits introduces a separate flood stage threshold for frozen as
well as an accompanying max_headroom setting that caps the amount of
free space necessary on frozen.
Relates #71844
Removes the experimental status for the frozen tier / shared_cache searchable snapshots for the 7.13 release.
Also adapts docs that URL repositories are now supported in 7.13 for searchable snapshots.
Changes:
* Refactors the "Getting Started" content down to one page.
* Refactors the README to reduce duplicated content and better mirror
Kibana's.
* Focuses the quick start on time series data, including data streams
and runtime fields.
* Streamlines self-managed install instructions to Docker.
Co-authored-by: debadair <debadair@elastic.co>
In update by query requests where max_docs < size and conflicts=proceed
we weren't using the remaining documents from the scroll response in
cases where there were conflicts and in the first bulk request the
successful updates < max_docs. This commit address that problem and
use the remaining documents from the scroll response instead of
requesting a new page.
Closes#63671
The frozen tier partially downloads shards only. This commit
introduces an autoscaling decider that scales the total storage
on the tier according to a configurable percentage relative to
the total data set size.
This commit adds the ability to define an index-time geo_point field
with a script parameter, allowing you to calculate points from other
values within the indexed document.
As we started thinking about applying on_script_error to runtime fields, to handle script errors at search time, we would like to use the same parameter that was recently introduced for indexed fields. We decided that continue or fail gives a better indication of the behaviour compared to the current ignore or reject which is too specific to indexing documents.
This commit applies such rename.
This change exposes the newly introduced parameter `dynamic_templates`
in ingest. This parameter can be set by a set processor or a script processor.
Relates #69948
This commit adds some per-index statistics to the `SnapshotInfo` blob:
- number of shards
- total size in bytes
- maximum number of segments per shard
It also exposes these statistics in the get snapshot API.
Allow direct access to a dense_vector' values in script
through the following functions:
- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude
Closes#51964
Frozen indices (partial searchable snapshots) require less heap per
shard and the limit can therefore be raised for those. We pick 3000
frozen shards per frozen data node, since we think 2000 is reasonable
to use in production.
Relates #71042 and #34021
This PR adds documentation for GeoIPv2 auto-update feature.
It also changes related settings names from geoip.downloader.* to ingest.geoip.downloader to have the same convention as current setting.
Relates to #68920
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Currently the `fields` API fetches the root flattened field and returns it in a
structured way in the response. In addition this change makes it possible to
directly query subfields. However, requesting flattened subfields via wildcard
patterns is not possible.
Closes#70605
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.
This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.
Addresses #41106.
Fleet server needs an API to access up to date global checkpoints for
indices. Additionally, it requires a mode of operation when fleet can
provide its current knowledge about the global checkpoints and poll for
advancements. This commit introduces this API in the fleet plugin.
This commit allows you to set 'script' and 'on_script_error' parameters
on date field mappers, meaning that runtime date fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `parent_join` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
This commit allows you to set 'script' and 'on_script_error' parameters
on IP field mappers, meaning that runtime IP fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
If enabled, the `delete_searchable_snapshot` option will attempt to delete the
index snapshot generated in any previous phase, for the purpose of mounting the
index as a searchable snapshot.
This shrinks a runtime field definition so that it fits on the screen
without scrolling. It also converts the doc into a test so we can be
sure it continues to work.
Relates to #69291
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `date_nanos` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
We have recently introduced the ability to associate an indexed field with a script. This commit updates the existing mappings stats to output stats about the script, similar to what we already do for runtime fields.
Today the response to `GET _cluster/state` does not include the roles of
the nodes in the cluster. In the past this made sense, roles were
relatively unchanging things that could be determined from elsewhere.
These days we have an increasingly rich collection of roles, with
nontrivial BWC implications, so it is important for debugging to be able
to see the specific roles as viewed by the master. This commit adds the
role names to the cluster state API output.
Relates #71385
Runtime fields are much more flexible than `script_fields` because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `boolean` field in a script to a runtime
field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do *something*.
Relates to #69291
This commit adds a deprecation note to the multiple data paths doc. It also removes mention of multiple paths support in the setup settings table.
relates #71205
This adds a "note" on the docs for the script query pointing folks to
runtime fields because they are more flexible. It also translates the
request example into runtime fields.
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Add warning admonition for removing runtime fields.
* Add cross-link to runtime fields.
* Expanding examples for runtime fields in a search request.
* Clarifying language and simplifying response tests.
We previously allowed but deprecated the ability for the shared cache to
be positively sized on nodes without the frozen role. This is because we
only allocate shared_cache searchable snapshots to nodes with the frozen
role. This commit completes our intention to deprecate/remove this
ability.
This PR sets the default value of `action.destructive_requires_name`
to `true.` Fixes#61074. Additionally, we set this value explicitly in
test classes that rely on wildcard deletions to clear test state.
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
This commit allows for composite aggregations in datafeeds.
Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations.
The restrictions for this support are as follows:
- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).
Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary.
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.
This change exposes for each field in the _field_caps response if the field is a metadata field.
This is needed for consumers of this API that want to filter these fields. Currently ML keeps a static list
and QL checks that the family type starts with `_`. In order to ease the addition of new metadata fields, this
change reworks the strategy in this solution and now only checks for the new flag.
Note that the new flag is also applied at the coordinator level in a best-effort to apply the logic on older nodes
in a mixed-version cluster.
* Make wildcard field use constant scoring queries for wildcard queries. Add a note about ignoring rewrite parameters on wildcard queries.
Also fixes caching issue where case sensitive and case insensitive results were cached as the same
Closes#69604
Previously, a datafeed and job must already exist for the `_preview` API to work.
With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them.
closes https://github.com/elastic/elasticsearch/issues/70264
* [DOCS] Clarify supported features for CCS.
* Clarify text and add subsection with title.
* Moving APIs to supported API section and paring down text.
* Removing security overview and condensing.
* Adding new security file.
* Minor changes.
* Removing link to pass build.
* Adding minimal security page.
* Adding minimal security page.
* Changes to intro.
* Add basic and basic + http configurations.
* Lots of changes, removed files, and redirects.
* Moving some AD and LDAP sections, plus more redirects.
* Redirects for SAML.
* Updating snippet languages and redirects.
* Adding another SAML redirect.
* Hopefully fixing the ci/2 error.
* Fixing another broken link for SAML.
* Adding what's next sections and some cleanup.
* Removes both security tutorials from the TOC.
* Adding redirect for removed tutorial.
* Add graphic for Elastic Security layers.
* Incorporating reviewer feedback.
* Update x-pack/docs/en/security/securing-communications/security-basic-setup.asciidoc
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Update x-pack/docs/en/security/securing-communications/security-minimal-setup.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Update x-pack/docs/en/security/securing-communications/security-basic-setup.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Update x-pack/docs/en/security/index.asciidoc
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Update x-pack/docs/en/security/securing-communications/security-basic-setup-https.asciidoc
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Apply suggestions from code review
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Additional changes from review feedback.
* Incorporating reviewer feedback.
* Incorporating more reviewer feedback.
* Clarify that TLS is for authenticating nodes
Co-authored-by: Tim Vernum <tim@adjective.org>
* Clarify security between nodes
Co-authored-by: Tim Vernum <tim@adjective.org>
* Clarify that TLS is between nodes
Co-authored-by: Tim Vernum <tim@adjective.org>
* Update title for configuring Kibana with a password
Co-authored-by: Tim Vernum <tim@adjective.org>
* Move section for enabling passwords between Kibana and ES to minimal security.
* Add section for transport description, plus incorporate more reviewer feedback.
* Moving operator privileges lower in the navigation.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
This adds named `teardown` support for doc tests similar to its support
for named `setup` section. This is useful when many doc files want to
share a similar `setup` AND `teardown`. I've introduced an example of
this in the CCR docs just to prove its works. We expect we'll use it for
datastreams as well.
Closes#70830
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This adds a heading for `shard_min_doc_count` and merges the paragraphs
for them. I wanted to link to this section earlier today and it wasn't a
"real" section so I couldn't.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This adds additional documentation for shared_cache searchable snapshots that are targeting the frozen tier:
- it generalizes the introduction section on searchable snapshots, mentioning that they come in two flavors now
as well as the relation to cold and frozen tiers,
- it expands the shared_cache section and
- it adds Cloud-specific instructions for getting started with the frozen tier
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: David Turner <david.turner@elastic.co>
* Implement dedicated client version compatibility
Add further dedicated client (xDBC, CLI) compatibility rules and
document these. A client is version-compatible with the server if:
- it supports version compatibility (past or on 7.7.0); and
- it's not on a version newer than server's; and
- it's major version is at most one unit behind server's.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Initial changes for scripting.
* Shorten script examples.
* Expanding types docs.
* Updating types.
* Fixing broken cross-link.
* Fixing map error.
* Incorporating review feedback.
* Fixing broken table.
* Adding more info about reference types.
* Fixing broken path.
* Adding more info an examples for def type.
* Adding more info on operators.
* Incorporating review feedback.
* Adding notconsole for example.
* Removing comments in example.
* More review feedback.
* Editorial changes.
* Incorporating more reviewer feedback.
* Rewrites based on review feedback.
* Adding new sections for storing scripts and shortening scripts.
* Adding redirect for stored scripts.
* Adding DELETE for stored script plus link.
* Adding section for updating docs with scripts.
* Incorporating final feedback from reviews.
* Tightening up a few areas.
* Minor change around other languages.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This aims at making the shrink action retryable. Every step is
retryable, but in order to provide an experience where ILM tries
to achieve a successful shrink even when the target node goes
missing permanently or the shrunk index cannot recover, this also
introduces a retryable shrink cycle within the shrink action.
The shrink action will generate a unique index name that'll be the
shrunk index name. The generated index name is stored in the lifecycle
state.
If the shrink action ends up waiting for the source shards to
colocate or for the shrunk index to recover for more than the configured
`LIFECYCLE_STEP_WAIT_TIME_THRESHOLD` setting, it will move back
to clean up the attempted (and failed) shrunk index and will retry
generating a new index name and attempting to shrink the source
to the newly generated index name.
We document that master nodes should have a persistent data path but
it's a bit hard to understand that this is what the docs are saying and
we don't really say why it's important. This commit clarifies this
paragraph.
Relates 49d0f3406c
Today the docs on node roles say that you shouldn't use dedicated
masters for heavy requests such as indexing and searching, but as per
the "designing for resilience" docs this guidance applies to all client
requests. This commit generalises the node roles docs slightly to
clarify this.
Relates #70435
This commit allows documents seen within the same time bucket to be out of order.
This is already supported within the native process.
Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.
This commit updates the default format of date_nanos field
on existing and new indices to use `strict_date_optional_time_nanos` instead of
`strict_date_optional_time`.
Using `strict_date_optional_time` as the default format for date_nanos doesn't
make sense because it accepts and parses dates with nanosecond precision,
but when it formats it drops the nanoseconds.
The change should be transparent for users, these formats accept the same input.
Relates #69192Closes#67063
If a search after request targets multiple indices and some of its sort
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.
This commit introduces the `format` parameter in the sort field so a
sort value of a date or date_nanos will be formatted using a date format
in a search response.
The below example illustrates how to use this new parameter.
```js
{
"query": {
"match_all": {}
},
"sort": [
{
"timestamp": {
"order": "asc",
"format": "strict_date_optional_time_nanos"
}
}
]
}
```
```js
{
"query": {
"match_all": {}
},
"sort": [
{
"timestamp": {
"order": "asc",
"format": "strict_date_optional_time_nanos"
}
}
],
"search_after": [
"2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
]
}
```
Closes#69192
Add support to delete component templates api to specify multiple template
names separated by a comma.
Change the cleanup template logic for rest tests to remove all component templates via a single delete component template request. This to optimize the cleanup logic. After each rest test we delete all templates. So deleting templates this via a single api call (and thus single cluster state update) saves a lot of time considering the number of rest tests.
Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.
Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.
Relates to #69973
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This commit addresses two aspects of the description in the docs of
configuring a local node to be a remote cluster client. First, the
documentation was referring to the legacy setting for configuring a
remote cluster client. Secondly, we clarify that additional features,
not only cross-cluster search, have requirements around the usage of the
remote_cluster_client role.
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Add support to delete composable index templates api to specify multiple template
names separated by a comma.
Change to cleanup template logic for rest tests to remove all composable index templates via a single delete composable index template request. This to optimize the cleanup logic. After each rest test we delete all templates. So deleting templates this via a single api call (and thus single cluster state update) saves a lot of time considering the number of rest tests.
If this pr is accepted then I will do the same change for the delete component template api.
Relates to #69973
Support for additional Client authentication methods was added in
the OIDC realm in #58708. This change adds the `rp.client_auth_method`
and `rp.client_auth_signature_algorithm` settings in the realm settings
reference doc.
Type configuration parameter was removed in 7.0. This change cleans
up some sentences where references to it had remained even after
we removed the parameter itself.
This commit changes the frozen phase within ILM in the following ways:
- The `searchable_snapshot` action now no longer takes a `storage` parameter. The storage type is
determined by the phase within which it is invoked (shared cache for frozen and full copy for
everything else).
- The frozen phase in ILM now no longer allows *any* actions other than `searchable_snapshot`
- If a frozen phase is provided, it *must* include a `searchable_snapshot` action.
These changes may seem breaking, but since they are intended to go back to 7.12 which has not been
released yet, they are not truly breaking changes.
This field mapper only lived in its own module so it could be licensed as x-pack
basic. Now it can be moved to core, which matches its status as a core type.
When performing a multi_match in cross_fields mode, we group fields based on
their analyzer and create a blended query per group. Our docs claimed that the
group scores were combined through a boolean query, but they are actually
combined through a dismax that incorporates the tiebreaker parameter.
This commit updates the docs and adds a test verifying the behavior.
It can be confusing to configure policies with phase timings that get smaller, because phase timings
are absolute. To make things a little clearer, this commit now rejects policies where a configured
min_age is less than a previous phase's min_age.
This validation is added only to the PutLifecycleAction.Request instead of the
TimeseriesLifecycleType class because we cannot do this validation every time a lifecycle is
created or else we will block cluster state from being recoverable for existing clusters that may
have invalid policies.
Resolves#70032
- adds a bit more overview on the process, including noting that it
works in terms of files
- notes that the snapshot is a point-in-time view of each shard, and not
necessarily exactly at the start of the snapshot process
- documents the `snapshot.max_concurrent_operations` setting
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Remove not completely correct statement about the size of dense_vectors
We do store a dense_vector as binary doc value with size `4*dims+4`.
But this is size before compression. As compressed size depends on
data itself, it is better to remove completely any statement
about the size.
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
This test was sorting by store.size, but these indices could end up with the same store size and
then the sorting would occasionally be wrong for the test.
Resolves#51619
The tip about updating a `search_analyzer` currently does not mention that most
of the time (when the current analyzer is not "default"), user need to repeat
the currently set "analyzer" parameter in the field definition. Adding this as a
short note.
You can't update the `analyzer` parameter in the PUT mappings API even if
the index is closed. This adds a TIP to call that out. And adds a TIP
for `search_quote_analyzer` which you *can* update.
The endpoint `_snapshottable_features` is long and implies incorrect
things about this API - it is used not just for snapshots, but also for
the upcoming reset API. Following discussions on the team, this commit
changes the endpoint to `_features` and removes the connection between
this API and snapshots, as snapshots are not the only use for the output
of this API.
We expect runtime fields to perform a little better than our "native"
aggregation script so we should point folks to them instead of the
"native" aggregation script.
* Support audit ignore policy by index privileges
Adding new audit ignore policy - privileges
For example, following policy will filter out all events, which actions
minimal required privilege is either "read" or "delete":
xpack.security.audit.logfile.events.ignore_filters:
example:
privileges: ["read", "delete"]
Resolve: #60877
Related: #10836
Related: #37148
* Support audit ignore policy by index privileges
Adding new audit ignore policy - privileges
For example, following policy will filter out all events, which actions
required privilege is either "read" or "delete":
xpack.security.audit.logfile.events.ignore_filters:
example:
privileges: ["read", "delete"]
Resolve: #60877
Related: #10836
Related: #37148
* To avoid ambiguity (as cluster and index policies may have the same
name) changing implementation to have to separate policies for
`index_privileges` and `cluster_privileges`.
If both are set for the same policy, throw the IllegalArgumentException.
* To avoid ambiguity (as cluster and index policies may have the same
name) changing implementation to have to separate policies for
`index_privileges` and `cluster_privileges`.
If both are set for the same policy, throw the IllegalArgumentException.
* Fixing Api key related privilege check which expects request and
authentication by introducing overloaded
version of findPrivilegesThatGrant
just checking if privileges which can grant the action regardless of the
request and authentication context.
* Fixing a test; adding a caching mechanism to avoid calling
findPrivilegesThatGrant each
time.
* Support audit ignore policy by index privileges
Addressing review feedback
* Support audit ignore policy by index privileges
Addressing review comments + changing approach:
- use permission check instead of simple "checkIfGrants"
- adding more testing
* Support audit ignore policy by index privileges
Addressing review comments + changing approach:
- use permission check instead of simple "checkIfGrants"
- adding more testing
* Support audit ignore policy by index privileges
Addressing review comments + changing approach:
- use permission check instead of simple "checkIfGrants"
- adding more testing
* Support audit ignore policy by index privileges
Addressing review comments + changing approach:
- use permission check instead of simple "checkIfGrants"
- adding more testing
* Revert "Support audit ignore policy by index privileges"
This reverts commit 152821e7
* Revert "Support audit ignore policy by index privileges"
This reverts commit 79649e9a
* Revert "Support audit ignore policy by index privileges"
This reverts commit 96d22a42
* Revert "Support audit ignore policy by index privileges"
This reverts commit 67574b2f
* Revert "Support audit ignore policy by index privileges"
This reverts commit 35573c8b
* Revert "Fixing a test; adding a caching mechanism to avoid calling findPrivilegesThatGrant each time."
This reverts commit 7faa52f3
* Revert "Fixing Api key related privilege check which expects request and authentication by introducing overloaded version of findPrivilegesThatGrant just checking if privileges which can grant the action regardless of the request and authentication context."
This reverts commit 72b9aefe
* Revert "To avoid ambiguity (as cluster and index policies may have the same name) changing implementation to have to separate policies for `index_privileges` and `cluster_privileges`. If both are set for the same policy, throw the IllegalArgumentException."
This reverts commit 7dd8fe7d
* Revert "To avoid ambiguity (as cluster and index policies may have the same name) changing implementation to have to separate policies for `index_privileges` and `cluster_privileges`. If both are set for the same policy, throw the IllegalArgumentException."
This reverts commit cb5bc09c
* Revert "Support audit ignore policy by index privileges"
This reverts commit a918da10
* Support audit ignore policy by actions
Getting back to action filtering
* Support audit ignore policy by actions
Cleaning up some tests
* Support audit ignore policy by actions
Cleaning up some tests
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit adds a new `_preview` endpoint for data frame analytics.
This allows users to see the data on which their model will be trained. This is especially useful
in the arrival of custom feature processors.
The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
This commit moves away from the static `rollup-{indexName}` rollup index
naming strategy and moves towards a randomized rollup index name scheme.
This will reduce the complications that exist if the RollupStep fails and retries
in any way. A separate cleanup will still be required for failed temporary indices,
but at least there will not be a conflict.
This commit generates the new rollup index name in the LifecycleExecutionState so
that it can be used in RollupStep and UpdateRollupIndexPolicyStep on a per-index
basis.
This adds additional statistics into the usage API for data frame analytics
and trained models.
For data frame analytics the added stats are:
- count of jobs by analysis type
- stats for peak_usage_bytes
For trained models the added stats are:
- counts of: total, prepackaged, other (not created by data frame analytics)
- counts by analysis type based on the inference config
- stats for estimated heap usage
- stats for estimated number of operations
This commit adds support for the Gold+ licensed `geo_line` aggregation.
This aggregation takes a collection of `geo_point` values and constructs a line
according to some sort value. Adding to transforms allows users to create these
potentially expensive lines out of band of visualizations and then do additional aggs/queries
against the pivoted data.
Examples would be:
"Do these daily user paths ever intersect?"
"Does this path enter and leave this area?"
* [DOCS] Adding grok support for runtime fields.
* Update response.
* Adding testresponse replacements.
* Update runtime field context and add dissect.
* Fixing backslash in the response.
* Fixing testresponse.
* Incorporating review feedback.
* Updates emit and adds cross link from ES runtime fields page.
To avoid confusion for the users replace the `YYYY` and `uuuu` year
patterns in the examples of `DATETIME_FORMAT/PARSE` with the most common
`yyyy` to avoid any confusion for users that might just copy paste those
queries for their own use case.
Relates to #68030
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.
The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.
Closes#54151Closes#2869
Autoscaling expects data tiers to be used exclusively both for node
roles and in ILM policies. This commit adds a test demonstrating that
as well as documentation for the behavior.
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.
Closes#65056
* [DOCS] Add runtime field to glossary
* Update links with external refs
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
A `model_alias` allows trained models to be referred by a user defined moniker.
This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models.
Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated.
When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically.
An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.
This commit removes support for JAVA_HOME. As we previously deprecated
usage of JAVA_HOME to override the path for the JDK, this commit follows
up by removing support for JAVA_HOME. Note that we do not treat
JAVA_HOME being set as a failure, as it is perfectly reasonable for a
user to have JAVA_HOME configured at the system level.
This commit introduces a dedicated envirnoment variable ES_JAVA_HOME to
determine the JDK used to start (if not using the bundled JDK). This
environment variable will replace JAVA_HOME. The reason that we are
making this change is because JAVA_HOME is a common environment variable
and sometimes users have it set in their environment from other JDK
applications that they have installed on their system. In this case,
they would accidentally end up not using the bundled JDK despite their
intentions. By using a dedicated environment variable specific to
Elasticsearch, we avoid this potential for conflict. With this commit,
we introduce the new environment variable, and deprecate the use of
JAVA_HOME. We will remove support for JAVA_HOME in a future commit.
* Reallocate runtime document
Reallocate document `runtime-fields-scriptless` from `runtime-search-request` to `runtime-mapping-fields`
* Move runtime without script section
Move runtime without script section to under the dynamic runtime mapping section
* Fix snippet formatting and remove discrete heading.
* Update test snippet.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This PR adds the special `_shard_doc` sort tiebreaker automatically to any
search requests that use a PIT. Adding the tiebreaker ensures that any
sorted query can be paginated consistently within a PIT.
Closes#56828
Today we imply that CCR will automatically fall back to a full index
copy if it cannot replay any missing history. This was true for earlier
versions of the design but we ultimately decided not to do this without
adjusting the docs to match. This commit fixes the docs.
Currently, existing runtime fields can be updated, but they cannot be removed. That allows to correct potential mistakes, but once a runtime field is added to the index mappings, it is not possible to remove it.
With this commit we introduce the ability to remove an existing runtime field by providing a null value for it through the put mapping API. If a field with such name does not exist, such specific instruction will have no effect on other existing runtime fields.
Note that the removal of runtime fields makes the recently introduced assertRefreshItNotNeeded assertion trip, because when each local node merges mappings back in, the runtime fields that were previously removed by the master node, get added back again locally. This is only a problem for the assertion that verifies that the removed refresh operation is never needed. We worked around this by tweaking the assertion to ignore runtime fields completely, for simplicity, by assertion on the serialized merged mappings and incoming mappings without the corresponding runtime section.
Co-authored-by: Adam Locke <adam.locke@elastic.co>