This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.
The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).
Relates to #60848
Fixed the inconsistencies regarding NULL argument handling.
NULL literal vs NULL field value as function arguments in some case
resulted in different function return values.
Functions should return with the same value no matter if the argument(s)
came from a field or from a literal.
The introduced integration test tests if function calls with same
argument values (regardless of literal/field) will return with the
same output (also checks if newly added functions are added to the
testcases).
Fixed the following functions:
* Insert: NULL start, length and replacement arguments (as fields) also
result in NULL return value instead of returning the input.
* Locate: NULL pattern results in NULL return value, NULL optional start
argument handled the same as missing start argument
* Replace: NULL pattern and replacement results in NULL instead of
returning the input
* Substring: NULL start or length results in NULL instead of returning
the input
Fixes#58907
Changes:
- Reworks the ILM tutorial to focus on the Elastic Agent and a built-in ILM policy
- Updates several screenshots in the docs for the new ILM UI
Co-authored-by: debadair <debadair@elastic.co>
A frozen tier is backed by an external object store (like S3) and caches only a
small portion of data on local disks. In this way, users can reduce hardware
costs substantially for infrequently accessed data. For the frozen tier we only
pull in the parts of the files that are actually needed to run a given search.
Further, we don't require the node to have enough space to host all the files.
We therefore have a cache that manages which file parts are available, and which
ones not. This node-level shared cache is bounded in size (typically in relation
to the disk size), and will evict items based on a LFU policy, as we expect some
parts of the Lucene files to be used more frequently than other parts. The level
of granularity for evictions is at the level of regions of a file, and does not
require evicting full files. The on-disk representation that was chosen for the
cold tier is not a good fit here, as it won't allow evicting parts of a file.
Instead we are using fixed-size pre-allocated files and have implemented our own
memory management logic to map regions of the shard's original Lucene files onto
regions in these node-level shared files that are representing the on-disk
cache.
This PR adds the core functionality to searchable snapshots to power such a
frozen tier:
- It adds the node-level shared cache that evicts file regions based on a LFU
policy
- It adds the machinery to dynamically download file regions into this cache and
serve their contents when searches execute.
- It extends the mount API with a new parameter, `storage`, which selects the
kind of local storage used to accelerate searches of the mounted index. If set
to `full_copy` (default, used for cold tier), each node holding a shard of the
searchable snapshot index makes a full copy of the shard to its local storage.
If set to `shared_cache`, the shard uses the newly introduced shared cache,
only holding a partial copy of the index on disk (used for frozen tier).
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: David Turner <david.turner@elastic.co>
This commit sets the recovery rate for dedicated cold nodes. The goal is
here is enhance performance of recovery in a dedicated cold tier, where
we expect such nodes to be predominantly using searchable snapshots to
back the indices located on them. This commit follows a simple approach
where we increase the recovery rate as a function of the node size, for
nodes that appear to be dedicated cold nodes.
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.
Closes#65623
This commit removes the following deprecated settings in v8:
- `gateway.expected_nodes`
- `gateway.expected_master_nodes`
- `gateway.recover_after_nodes`
- `gateway.recover_after_master_nodes`
Co-authored-by: ShawnLi1014 <shawnli1014@gmail.com>
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
This change adds a new "architectures" section to the
cluster stats, containing a summary of how many nodes
in the cluster are on each processor architecture.
The intention is to make it easier to see whether
clusters are running on aarch64, or mixed x86_64/aarch64,
which may aid support as aarch64 becomes more commonly
used.
Refactoring of cat transform to show more relevant information. The current cat transform shows a
lot of configuration details, however cat should show operationally useful information. This PR
changes the defaults and also adds when transform did a search last.
Taking a snapshot of a cluster containing searchable snapshot indices is
kind of mindbending. This commit adds docs to indicate that this does
work.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* [DOCS] Minor rewording for HTTP settings.
* Revert "[DOCS] Minor rewording for HTTP settings."
This reverts commit 9a831adca6.
* Adds advanced wording to HTTP & transport settings.
Today's network config docs are split into "Network", "HTTP" and
"Transport" pages, with unclear relationships between them. We often
encounter users with weird configs that indicate they don't really
understand how these settings all relate. In fact these pages are all
very interrelated, and the HTTP and Transport pages are almost all only
for advanced users. This commit brings these docs into a single page and
rewords some things to try and guide users away from the advanced
settings unless their configuration needs all the extra complexity.
It also adds a section entitled "Binding and publishing" which clarifies
the meanings of the `bind_host` and `publish_host` parameters. This is
also a common source of confusion amongst users.
It also clarifies that many of these settings accept a list of
addresses, and warns that this may not be what you want. Closes#67956.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.