Removes the experimental status for the frozen tier / shared_cache searchable snapshots for the 7.13 release.
Also adapts docs that URL repositories are now supported in 7.13 for searchable snapshots.
This commit adds some per-index statistics to the `SnapshotInfo` blob:
- number of shards
- total size in bytes
- maximum number of segments per shard
It also exposes these statistics in the get snapshot API.
- adds a bit more overview on the process, including noting that it
works in terms of files
- notes that the snapshot is a point-in-time view of each shard, and not
necessarily exactly at the start of the snapshot process
- documents the `snapshot.max_concurrent_operations` setting
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
The endpoint `_snapshottable_features` is long and implies incorrect
things about this API - it is used not just for snapshots, but also for
the upcoming reset API. Following discussions on the team, this commit
changes the endpoint to `_features` and removes the connection between
this API and snapshots, as snapshots are not the only use for the output
of this API.
Today we rely on blob stores behaving in a certain way so that they can be used
as a snapshot repository. There are an increasing number of third-party blob
stores that claim to be S3-compatible, but which may not offer a suitably
correct or performant implementation of the S3 API. We rely on somesubtle
semantics with concurrent readers and writers, but some blob stores may not
implement it correctly. Hitting a corner case in the implementation may be rare
in normal use, and may be hard to reproduce or to distinguish from an
Elasticsearch bug.
This commit introduces a new `POST /_snapshot/.../_analyse` API which exercises
the more problematic corners of the repository implementation looking for
correctness bugs and measures the details of the performance of the repository
under concurrent load.
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
This commit spells out how important repository reliability is to
searchable snapshots, and also documents a procedure for taking a backup
of a snapshot repository.
Relates #54944
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.
This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.
This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.
In #33102 we added a warning against using filesystem backups.
Experience has shown that the wording we added was insufficiently
general and open to misinterpretation. This commit reworks it to be
clearer.
This commit also clarifies that snapshots are not incremental across
repositories.
Today we describe snapshots as "incremental" but their incrementality is
rather different beast from e.g. incremental filesystem backups. With
traditional backups you take a large and relatively infrequent "full"
backup and then a sequence of smaller "incremental" ones, and this whole
sequence of backups is required for a restore so it must be kept around
until at least the next full backup. In contrast, Elasticsearch
snapshots are logically independent and each can be deleted without
affecting the integrity of the others.
This distinction frequently causes confusion amongst newer users, so
this commit clarifies what we mean by "incremental" in the docs.
Adds a limit to the maximum number of snapshots that are allowed
to be added to a snapshot repository as a safety measure of last resort
against repositories that grow to an unmanagable size due to e.g. incorrect SLM
settings.
Co-authored-by: David Turner <david.turner@elastic.co>
Removing some now outdated statements that refer to a time
when snapshot operations could not run concurrently.
Closes#61680
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Adds table with icons for simplicity.
* Updating table for clarity.
* Changing table formatting and incorporating more feedback.
* Changing table alignment.
* Adding new page for restore snapshot API.
* Improving test cases, lots of edits, and streamlining content.
* Incorporating review suggestions and feedback.
* Specify `index alias` vs `alias`
* Change parameter order
* Provide clarity around regular expression
* Add link to SLM parameters
* Split sentences in example
* Adding link to master node page.
The clock resolution for this API is our default 200ms. It is unlikely but
possible that a shard snapshot starts and ends on separate clock ticks and that breaks the test.
Just allowing any value here seems fine to me (seems we can't match for integer specifically).
* Updating snapshot/restore pages to align with API changes.
* Fixing texts in delete snapshot page.
* Removing duplicate code sample and making editorial changes.
* Change "deleted" to "delete"
* Incorporating review feedback and making minor editorial changes.
* Remove titleabbrev
* Add paragraph break
* Remove titleabbrev from restore page
* Remove titleabbrev from create page
* Change "Create" to lowercase
* Change API names to lowercase
* Remove extraneous delimiters
* Change "Delete" to lowercase
* Single-sourcing warning and clarifying warning text.
We can't just assume a fixed number for the overall file count.
Depending on how the merging/flushing works out we won't always have
4 files for the index across all versions, systems etc.
Also, we could have x-pack concurrently create some system indices
which could mess up the total numbers here.
Fixed by only snapshotting a single index+shard in the snapshot that
we get the status for and verifying consistency instead of equality
for total file counts.
Closes#59767
* Adding get snapshot status API docs.
* Adding more fields and a link to the new page.
* Adding missing spaces in TESTRESPONSES
* Adding more parameters and making some edits.
* Marking snapshot as optional
* Marking repository as optional
* Add data type for stats
* Add data type for shard_stats
* Incorporating review feedback.
* Lots of review feedback incorporated.
* Fixing tests to unbreak CI builds.
* Changing indices to index.
* We now have concurrent repository operations so the one at a time limit does not apply any longer
* Initialization was never slow solely due to loading information about all existing snaphots (though this contributed)
but also because two cluster state updates and a few writes to the repository had to happen before initialization could return
* Repo data necessary for a snapshot create operation is now cached on heap so loading it is effectively instant
* Snapshot initialization is just a single CS update now
* Initialization does no writes to the repository whatsoever
* Fixed missing `repository`
* Adding page for get snapshot API.
* Adding values for state and cleaning up some other formatting.
* Adding missing forward slash to GET request.
* Updating values for start_time and end_time in TESTRESPONSE.
* Swap "return" for "retrieve"
* Swap "return" for "retrieve" 2
* Change .snapshot to .response
* Adding response parameters and incorporating edits from review.
* Update response example to include repository info
* Change dash to underscore
* Add data type for snapshot in response
* Incorporating review comments and adding missing response definitions.
* Minor rewording in description.
Since 2.0.0 (56a264cf6d) we have documented that restoring a snapshot
typically results in `red` cluster health. However since 5.0.0 (#19516)
this hasn't been true, we report `yellow` health for unassigned
primaries that will be recovered from a snapshot in the future. This
commit adjusts these docs to match today's behaviour.