Found this to be the easiest fix, the alternative would have been to actually
wait for all snapshot meta threads to become blocked but that's kind of hacky.
closes#74743
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
Pagination and snapshots for get snapshots API, build on top of the current implementation to enable work that needs this API for testing. A follow-up will leverage the changes to make things more efficient via pagination.
Relates https://github.com/elastic/elasticsearch/pull/73570 which does part of the under-the-hood changes required to efficiently implement this API on the repository layer.
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Same as #72644. This is a much longer running action than normal
get snapshots even so it should definitely be cancellable.
Parallelization for this action will be introduced in a separate PR.
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
If this runs needlessly for large repositories (especially in timeout/retry situations)
it's a significant memory+cpu hit => made it cancellable like we recently did for many
other endpoints.
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
System index descriptors are used to describe a system index, which are
expected to change as new versions are developed. As part of this, the
descriptors had a minimum supported version field so that the contents
within that descriptor would not be applied if there were nodes older
than that version. However, this falls short of being able to
accurately describe what a system index should look like in a given
cluster where there are mixed node versions.
This change moves us towards being able to accurately describe and
know what the system index should look like. A system index is now
able to accept a list of the prior system index descriptor objects
so that clusters with mixed versions can select the appropriate
descriptor and ensure the index is created properly. As the node
versions change during a rolling upgrade, the cluster will then be
able to adapt the system index to the most recent version once all
master and data nodes have been upgraded.
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Yang Wang <ywangd@gmail.com>
Today by default the `MANAGEMENT` threadpool always permits 5 threads
even if the node has a single CPU, which unfairly prioritises management
activities on small nodes. With this commit we limit the size of this
threadpool to the number of processors if less than 5.
Relates #70435
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
This commit introduces system index types that will be used to
differentiate behavior. Previously system indices were all treated the
same regardless of whether they belonged to Elasticsearch, a stack
component, or one of our solutions. Upon further discussion and
analysis this decision was not in the best interest of the various
teams and instead a new type of system index was needed. These system
indices will be referred to as external system indices. Within external
system indices, an option exists for these indices to be managed by
Elasticsearch or to be managed by the external product.
In order to represent this within Elasticsearch, each system index will
have a type and this type will be used to control behavior.
Closes#67383
Opening a Lucene index that supports soft-deletes currently creates the liveDocs bitset eagerly. This requires scanning
the doc values to materialize the liveDocs bitset from the soft-delete doc values. In order for searchable snapshot shards
to be available for searches as quickly as possible (i.e. on recovery, or in case of FrozenEngine whenever a search comes
in), they should read as little as possible from the Lucene files.
This commit introduces a LazySoftDeletesDirectoryReaderWrapper, a variant of Lucene's
SoftDeletesDirectoryReaderWrapper that loads the livedocs bitset lazily on first access. It is special-tailored to
ReadOnlyEngine / FrozenEngine as it only operates on non-NRT readers.
A small followup to #67413 and #68965: the underlying actions of the
`GET /_cat/segments` API are now cancellable, so we may as well cancel
them if needed.
The response to an `IndicesSegmentsAction` might be large, perhaps 10s
of MBs of JSON, and today it is serialized on a transport thread. It
also might take so long to respond that the client times out, resulting
in the work needed to compute the response being wasted.
This commit introduces the `DispatchingRestToXContentListener` which
dispatches the work of serializing an `XContent` response to a
non-transport thread, and also makes `TransportBroadcastByNodeAction`
sensitive to the cancellability of its tasks.
It uses these two features to make the `RestIndicesSegmentsAction`
serialize its response on a `MANAGEMENT` thread, and to abort its work
more promptly if the client's channel is closed before the response is
sent.
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
Today `GET _cluster/stats` can be quite expensive, and is typically
retrieved periodically by monitoring systems (e.g. Metricbeat) that
implement a client-side timeout. When the client times out it closes the
HTTP connection in use. With this commit we react to the close of the
HTTP connection by cancelling the ongoing stats request, avoiding
unnecessary duplicated work.
Relates #55550
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Today if a client requests a cluster state and then closes the
connection then we still do all the work of computing and serializing
the cluster state before finally dropping it all on the floor.
With this commit we introduce checks to make sure that the HTTP channel
is still open before starting the serialization process. We also make
the tasks themselves cancellable and abort any ongoing waiting if the
channel is closed (mainly to make the cancellability testing easier).
Finally we introduce a more detailed description of the task to help
identify cases where clients are inefficiently requesting more
components of the cluster state than they need.
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.
#51816
This finishes porting all tasks created in gradle build scripts and plugins to use
the task avoidance api (see #56610)
* Port all task definitions to task avoidance api
* Fix last task created during configuration
* Fix test setup in :modules:reindex
* Declare proper task inputs
(list -> copy -> add one -> wrap immutable) is a pretty common pattern in CS
updates and tests => added a shortcut for it here and used it in easily identifyable
spots.
Closes#20640.
This PR introduces a new parameter to v2 templates, `allow_auto_create`,
which allows templates to override the cluster setting `auto_create_index`.
Notes:
* `AutoCreateIndex` now looks for a matching v2 template, and if its
`allow_auto_create` setting is true, it overrides the usual logic.
* `TransportBulkAction` previously used `AutoCreateIndex` to check
whether missing indices should be created. We now rely on
`AutoCreateAction`, which was already differentiating between creating
indices and creating data streams. I've updated `AutoCreateAction` to
use `AutoCreateIndex`. Data streams are also influenced by
`allow_auto_create`, in that their default auto-create behaviour can
be disabled with this setting.
* Most of the Java file changes are due to introducing an extra
constructor parameter to `ComposableIndexTemplate`.
* I've added the new setting to various x-pack templates
* I added a YAML test to check that watches can be created even when
`auto_create_index` is `false`.
When a gzip-encoded response is decompressed the response should no more
have a content-encoding header and content-length should be set to
"unknown". GzipDecompressingEntity correctly does this for the entity
but the response still reported the original response's content-encoding
and content-length headers.
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.
Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:
- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`
Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build