Removes the `@timestamp` field mapping from several data stream index
template snippets.
With #59317, the `@timestamp` field defaults to a `date` field data type
for data streams.
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
Updates the 8.0 breaking changes to clarify that passwords for the removed
`kibana` user are not preserved for the replacement `kibana_system` users.
Closes#59353
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:
```
...
"data_streams" : {
"available" : true,
"enabled" : true,
"data_streams" : 3,
"indices_count" : 17
}
...
```
Changes:
* Swaps the `dev` admonitions for `experimental` admonitions
* Removes `ifdef` statements preventing the docs from appearing in
released branches
* We now have concurrent repository operations so the one at a time limit does not apply any longer
* Initialization was never slow solely due to loading information about all existing snaphots (though this contributed)
but also because two cluster state updates and a few writes to the repository had to happen before initialization could return
* Repo data necessary for a snapshot create operation is now cached on heap so loading it is effectively instant
* Snapshot initialization is just a single CS update now
* Initialization does no writes to the repository whatsoever
* Fixed missing `repository`
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.
This setting may also be updated for a stopped job.
More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
Changes:
* Documents the `size` default as `10`.
* Updates `size` param def to note its relation to pipes.
* Updates the `head` and `tail` pipe docs to modify sequences.
* Documents the `fetch_size` parameter.
Relates to #59014 and #59063
* Adding page for get snapshot API.
* Adding values for state and cleaning up some other formatting.
* Adding missing forward slash to GET request.
* Updating values for start_time and end_time in TESTRESPONSE.
* Swap "return" for "retrieve"
* Swap "return" for "retrieve" 2
* Change .snapshot to .response
* Adding response parameters and incorporating edits from review.
* Update response example to include repository info
* Change dash to underscore
* Add data type for snapshot in response
* Incorporating review comments and adding missing response definitions.
* Minor rewording in description.
ES EQL queries do not support the comparison of a variable, such as
a field value, to another variable.
This adds a related para and example to the EQL syntax docs.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.
Small edit highlighting the fact that atomic cluster state change does not guarantee lack of errors for in-flight requests.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This request:
```
POST /_search
{
"aggs": {
"a": {
"adjacency_matrix": {
"filters": {
"1": {
"terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
}
}
}
}
}
}
```
Would fail with a 500 error and a message like:
```
{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason":"async actions are left after rewrite"
}
]
}
}
```
This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
Since 2.0.0 (56a264cf6d) we have documented that restoring a snapshot
typically results in `red` cluster health. However since 5.0.0 (#19516)
this hasn't been true, we report `yellow` health for unassigned
primaries that will be recovered from a snapshot in the future. This
commit adjusts these docs to match today's behaviour.
* [DOCS] Combo version of ILM docs.
* [DOCS] Moved tutorial from Kibana.
* Adds documentation for index lifecycle policies (#28705)
* [DOCS] Adds documentation for index lifecycle policies
* [DOCS] Updated image for policy options to show all menu items
* Update create-policy.asciidoc
* [DOCS] Incorporated review comments on hot and warm phase
* [DOCS] Additional changes to warm phase
* [DOCS] Removed the word open in the warm phase
* Adds X-Pack icon for ILM (#34178)
* Add ILM tutorial (#59502)
* Add tutorial for ILM with filebeat
* Change screenshots and add additional steps
* Update screenshots, add numbered steps, and other minor edits
* Incorporate feedback: update links, formatting, and minor edits
* Move tip inline with list
* Apply suggestions from code review
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
* Move TIP inline . . . again
* Put TIP inline
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* Updates for navigation redesign (#68709)
* [DOCS] Updates for navigation redesign
* Getting started
* Set up text
* Discover
* Dashboard, Graph, ML, Maps, APM, SIEM, Dev tools
* Dev Tools, Stack Monitoring, Management
* Management
* Final changes
* [DOCS] Updates for navigation redesign
* [DOCS] Updates CCR monitoring screenshots
* updates SIEM screenshot and Cases overview text
* Added Brandon's APM image
* [DOCS] Refines CCR shard screenshot
* Removed merge conflict image file
Co-authored-by: lcawl <lcawley@elastic.co>
Co-authored-by: Ben Skelker <ben.skelker@elastic.co>
* [DOCS] Put API examples in collapsible sections like ML does
* Fix include
* Added tutorial images
* Fixed images
* Add short title for FB tutorial
* Add missing files
* Incorporate review feedback
* review feedback
* Incorporated review feedback
Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Melori Arellano <melori@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: Kaarina Tungseth <kaarina.tungseth@elastic.co>
Co-authored-by: Ben Skelker <ben.skelker@elastic.co>
Part of #48366. Add documentation for the dangling indices
API added in #58176.
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Adding ESS icons to supported ES settings.
* Adding new file for supported ESS settings.
* Adding supported ESS settings for HTTP and disk-based shard allocation.
* Adding more supported settings for ESS.
* Adding descriptions for each Cloud section, plus additional settings.
* Adding new warehouse file for Cloud, plus additional settings.
* Adding node settings for Cloud.
* Adding audit settings for Cloud.
* Resolving merge conflict.
* Adding SAML settings (part 1).
* Adding SAML realm encryption and signing settings.
* Adding SAML SSL settings.
* Adding Kerberos realm settings.
* Adding OpenID Connect Realm settings.
* Adding OpenID Connect SSL settings.
* Resolving leftover Git merge markers.
* Removing Cloud settings page and link to it.
* Add link to mapping source
* Update docs/reference/docs/reindex.asciidoc
* Incorporate edit of HTTP settings
* Remove "cloud" from tag and ID
* Remove "cloud" from tag and update description
* Remove "cloud" from tag and ID
* Change "whitelists" to "specifies"
* Remove "cloud" from end tag
* Removing cloud from IDs and tags.
* Changing link reference to fix build issue.
* Adding index management page for missing settings.
* Removing warehouse file for Cloud and moving settings elsewhere.
* Clarifying true/false usage of http.detailed_errors.enabled.
* Changing underscore to dash in link to fix ci build.
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.
The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.
Relates to #53100
* Forbid read-only-allow-delete block in blocks API
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
* Missing xref
* Reword paragraph on read-only-allow-delete block
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.
Relates to #42035
This commit adds the component and composable templates, as well as ILM policies, for the new
default indexing strategy. It installs:
- logs-default-mappings (component)
- logs-default-settings (component)
- logs-default-policy (ilm policy)
- logs-default-template (composable template)
- metrics-default-mappings (component)
- metrics-default-settings (component)
- metrics-default-policy (ilm policy)
- metrics-default-template (composable template)
These templates and policies are managed by a new x-pack module, `stack`, and can be disabled by
setting `stack.templates.enabled` to `false`.
These ensure that patterns for the `logs-*-*` and `metrics-*-*` indices are set up to create data
streams with the proper mappings and settings.
This also makes changes to the `IndexTemplateRegistry` to support installing component and
composable templates (previously it supported only legacy templates).
Resolves#56709
This commit adds conditional logic to the docs to avoid including any
docs on searchable snapshots in released versions.
Rework of #58556 which was reverted.
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.
The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.
Relates https://github.com/elastic/elasticsearch/issues/57023
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.
This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
* Adding create index snapshot API page.
* Condense API description.
* Remove parameter from query.
* Add POST method and remove `-name` from the snapshot variable.
* Expand description of `<snapshot>`.
* Add data streams to introduction and expand the overall description.
* Add support for data streams.
* Add support for data streams.
* Add data stream and reference for "point-in-time view".
* Add data streams.
* Change `my_backup` to `my_repository`.
* Add description of boolean options for `wait_for_completion` parameter.
* Change command --> response
* Clarify `indices` parameter description
* Update `ignore-unavailable` parameter description
* Reword example description
* Remove "index" from API name
* Incorporating review comments from James R.
* Adding a much better request + response
* Clarify `include_global_state` description
* Incorporating additional edits.
* Changing my_backup to my_repository in example.
* Update snippet test to avoid failures
* Update TESTRESPONSE snippets
* Remove errant space
* Removing the parameter per reviewer comments
Replaces `composable index template` and `composable template` with
`index template` throughout data stream-related docs.
`Composable index template` is only used to contrast with legacy index
templates.
Removes references to partial results from the async EQL search docs.
If an EQL search does not complete during the `wait_for_completion_timeout`
timeout period, it returns no results.
Adds parsing of `status` and `increased_memory_estimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`increased_memory_estimate_bytes` can be used to update the job's
limit in order to restart the job.
Changes the titles for tokenizer pages to sentence case.
Also moves the 'Path hierarchy tokenizer examples' page within the
'Path hierarchy tokenizer' page and adds a related redirect.
* Fix: preserve URI query and fragment char escaping
This commit fixes an issue emerging when the connection string URI
contains escaped characters.
The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.
The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.
We're tracking this aggregation's experimental-progress in #58573. We'd
like a little time to be able to make backwards incompatible changes to
the aggregation because we're not 100% sure about the request and
response format yet.
With #58096, data streams now track the timestamp field mapping outside
of the template associated with the stream. This means you can no longer
update the timestamp field mapping using template changes.
This updates the associated data stream docs.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.
This PR addresses #9572.
The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.
At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.
The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.
Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.
It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
When a local model is constructed, the cache hit miss count is incremented.
When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.