This adds additional statistics into the usage API for data frame analytics
and trained models.
For data frame analytics the added stats are:
- count of jobs by analysis type
- stats for peak_usage_bytes
For trained models the added stats are:
- counts of: total, prepackaged, other (not created by data frame analytics)
- counts by analysis type based on the inference config
- stats for estimated heap usage
- stats for estimated number of operations
This commit adds support for the Gold+ licensed `geo_line` aggregation.
This aggregation takes a collection of `geo_point` values and constructs a line
according to some sort value. Adding to transforms allows users to create these
potentially expensive lines out of band of visualizations and then do additional aggs/queries
against the pivoted data.
Examples would be:
"Do these daily user paths ever intersect?"
"Does this path enter and leave this area?"
* [DOCS] Adding grok support for runtime fields.
* Update response.
* Adding testresponse replacements.
* Update runtime field context and add dissect.
* Fixing backslash in the response.
* Fixing testresponse.
* Incorporating review feedback.
* Updates emit and adds cross link from ES runtime fields page.
To avoid confusion for the users replace the `YYYY` and `uuuu` year
patterns in the examples of `DATETIME_FORMAT/PARSE` with the most common
`yyyy` to avoid any confusion for users that might just copy paste those
queries for their own use case.
Relates to #68030
Today if an index is set to `auto_expand_replicas: N-all` then we will
try and create a shard copy on every node that matches the applicable
allocation filters. This conflits with shard allocation awareness and
the same-host allocation decider if there is an uneven distribution of
nodes across zones or hosts, since these deciders prevent shard copies
from being allocated unevenly and may therefore leave some unassigned
shards.
The point of these two deciders is to improve resilience given a limited
number of shard copies but there is no need for this behaviour when the
number of shard copies is not limited, so this commit supresses them in
that case.
Closes#54151Closes#2869
Autoscaling expects data tiers to be used exclusively both for node
roles and in ILM policies. This commit adds a test demonstrating that
as well as documentation for the behavior.
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.
Closes#65056
* [DOCS] Add runtime field to glossary
* Update links with external refs
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
A `model_alias` allows trained models to be referred by a user defined moniker.
This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models.
Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated.
When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically.
An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.
This commit removes support for JAVA_HOME. As we previously deprecated
usage of JAVA_HOME to override the path for the JDK, this commit follows
up by removing support for JAVA_HOME. Note that we do not treat
JAVA_HOME being set as a failure, as it is perfectly reasonable for a
user to have JAVA_HOME configured at the system level.
This commit introduces a dedicated envirnoment variable ES_JAVA_HOME to
determine the JDK used to start (if not using the bundled JDK). This
environment variable will replace JAVA_HOME. The reason that we are
making this change is because JAVA_HOME is a common environment variable
and sometimes users have it set in their environment from other JDK
applications that they have installed on their system. In this case,
they would accidentally end up not using the bundled JDK despite their
intentions. By using a dedicated environment variable specific to
Elasticsearch, we avoid this potential for conflict. With this commit,
we introduce the new environment variable, and deprecate the use of
JAVA_HOME. We will remove support for JAVA_HOME in a future commit.
* Reallocate runtime document
Reallocate document `runtime-fields-scriptless` from `runtime-search-request` to `runtime-mapping-fields`
* Move runtime without script section
Move runtime without script section to under the dynamic runtime mapping section
* Fix snippet formatting and remove discrete heading.
* Update test snippet.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This PR adds the special `_shard_doc` sort tiebreaker automatically to any
search requests that use a PIT. Adding the tiebreaker ensures that any
sorted query can be paginated consistently within a PIT.
Closes#56828
Today we imply that CCR will automatically fall back to a full index
copy if it cannot replay any missing history. This was true for earlier
versions of the design but we ultimately decided not to do this without
adjusting the docs to match. This commit fixes the docs.
Currently, existing runtime fields can be updated, but they cannot be removed. That allows to correct potential mistakes, but once a runtime field is added to the index mappings, it is not possible to remove it.
With this commit we introduce the ability to remove an existing runtime field by providing a null value for it through the put mapping API. If a field with such name does not exist, such specific instruction will have no effect on other existing runtime fields.
Note that the removal of runtime fields makes the recently introduced assertRefreshItNotNeeded assertion trip, because when each local node merges mappings back in, the runtime fields that were previously removed by the master node, get added back again locally. This is only a problem for the assertion that verifies that the removed refresh operation is never needed. We worked around this by tweaking the assertion to ignore runtime fields completely, for simplicity, by assertion on the serialized merged mappings and incoming mappings without the corresponding runtime section.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Today we rely on blob stores behaving in a certain way so that they can be used
as a snapshot repository. There are an increasing number of third-party blob
stores that claim to be S3-compatible, but which may not offer a suitably
correct or performant implementation of the S3 API. We rely on somesubtle
semantics with concurrent readers and writers, but some blob stores may not
implement it correctly. Hitting a corner case in the implementation may be rare
in normal use, and may be hard to reproduce or to distinguish from an
Elasticsearch bug.
This commit introduces a new `POST /_snapshot/.../_analyse` API which exercises
the more problematic corners of the repository implementation looking for
correctness bugs and measures the details of the performance of the repository
under concurrent load.