It is beneficial to sort segments within a datastream's index
by desc order of their max timestamp field, so
that the most recent (in terms of timestamp) segments
will be first.
This allows to speed up sort query on @timestamp desc field,
which is the most common type of query for datastreams,
as we are mostly concerned with the recent data.
This patch addressed this for writable indices.
Segments' sorter is different from index sorting.
An index sorter by itself is concerned about the order of docs
within an individual segment (and not how the segments are organized),
while the segment sorter is only used during search and allows
to start docs collection with the "right" segment,
so we can terminate the collection faster.
This PR adds a property to IndexShard `isDataStreamIndex` that
shows if a shard is a part of datastream.
This introduces a basic public yaml rest test plugin that is supposed to be used by external
elasticsearch plugin authors. This is driven by #76215
- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins
Co-authored-by: Mark Vieira <portugee@gmail.com>
Add system data streams to the "snapshot feature state" code block, so that if
we're snapshotting a feature by name we grab that feature's system data streams
too. Handle these data streams on the restore side as well.
* Add system data streams to feature state snapshots
* Don't pass system data streams through index name resolution
* Don't add no-op features to snapshots
* Hook in system data streams for snapshot restoration
As we build more functionality around data streams,
it is necessary to quickly identify if an index is a part
of data stream.
For this purpose we need to move the field mapper
for datatastream's timestamp meta-field to core.
This PR also adds MappingLookup::isDataStreamTimestampFieldEnabled
method that can quickly check if data-stream timestamp meta-field
is enabled.
This commit modifies the restore process to ensure that the `system`
flag is properly applied to restored data streams. Otherwise, this
flag is lost when restoring system data streams, which causes errors
and/or assertion failures as the backing indices are properly marked
as system indices, but the restored data stream is no longer a
system data stream.
Also adds a test to ensure this flag survives a round trip through
the snapshot/restore process.
There was a simple bug that caused the default to `*` or `_all` on the request to not work anymore
when resolving what indices to restore because we would add datastream indices to the restore array,
thus causing the indices filtering to think we only wanted those specific indices.
closes#75192
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.
To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
When removing aliases allow that an alias action's index expression can match
both data streams and regular indices.
This allows api calls like `DELETE /_all/_alias/my-alias`, which can common
in tear down / cleanup logic. In this case `my-alias` just points to regular
indices, but `_all` can be expanded to data streams too if exist. This can
then trigger validation logic that prevents adding aliases that refer to both
indices and data streams. However this api call never adds any alias, only
removes it. So failing with this validation error doesn't make much sense.
This change adjusts the validation logic so that: 'match with both data streams and
regular indices are disallowed' validation is only executed for alias actions
that add aliases.
Relates to #66163
Fix update indices aliases to accept wildcard expressions as alias names
in remove alias actions for aliases referring to data stream.
Also allow add alias actions for data stream aliases to contain multiple
aliases names. Both changes are inline with alias actions for indices aliases.
Relates to #66163
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
Use `is_write_index` instead of `is_write_data_stream` to indicate whether an data stream alias
is a write data stream alias. Although the latter is a more accurate name, the former is what is
used to turn a data stream alias into a write data stream in the indices aliases api. The design
of data stream aliases is that it looks and behaves like any other alias and
using `is_write_data_stream` would go against this design.
Also index or indices is an accepted overloaded term that can mean both regular index,
data stream or an alias in Elasticsearch APIs. By using `is_write_index`, consumers
of the get aliases API don't need to make changes.
Relates to #66163
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
* Add new thread pool for critical operations
* Split critical thread pool into read and write
* Add POJO to hold thread pool names
* Add tests for critical thread pools
* Add thread pools to data streams
* Update settings for security plugin
* Retrieve ExecutorSelector from SystemIndices where possible
* Use a singleton ExecutorSelector
This allows indexing documents into a data stream alias.
The ingestion is that forwarded to the write index of the data stream
that is marked as write data stream.
The `is_write_index` parameter can be used to indicate what the write data stream is,
when updating / adding a data steam alias.
Relates to #66163
Backing index names contain a date component.
Instead of defining what the expected backing index names are before creating the data streams,
resolve these expected backing index names after the data streams are created.
This way we avoid failures that can occur around midnight (local time), where the expected
names are created before midnight and the data streams are created after midnight.
Closes#73510
Currently when attempting to an alias to points to both data streams and regular indices
the alias does get created, but points only to data streams. With this change attempting
to add such aliases results in a client error.
Currently when adding data stream aliases with unsupported parameters (e.g. filter or routing)
the alias does get created, but without the unsupported parameters. With this change
attempting to create aliases to point to data streams with unsupported parameters will result
in a client error.
Relates to #66163
When data stream aliases are resolved then the includeDataStreams flag of an action request should be taken into account,
so that data stream aliases aren't resolved to backing indices for apis that don't support data streams.
Closes#73195
The get alias api should take into account the aliases parameter when
returning aliases that refer to data streams and don't return entries
for data streams that don't have any aliases pointing to it.
Relates to #66163
Data stream aliases are stored separately from the data streams
in the cluster state. Currently snapshot/restore only takes data
streams into account during snapshotting and restoring, this
change changes snapshot/restore to also capture and restore
data stream aliases.
Which alias instances to use depends on the actual data streams
that are included in a snapshot or restored from a snapshot.
Relates to #66163
Aliases to data streams can be defined via the existing update aliases api.
Aliases can either only refer to data streams or to indices (not both).
Also the existing get aliases api has been modified to support returning
aliases that refer to data streams.
Aliases for data streams are stored separately from data streams and
and refer to data streams by name and not to the backing indices of
a data stream. This means that when backing indices are added or removed
from a data stream that then the data stream alias doesn't need to be
updated.
The authorization model for aliases that refer to data streams is the
same as for aliases the refer to indices. In security privileges can
be defined on aliases, indices and data streams. When a privilege is
granted on an alias then access is also granted on the indices that
an alias refers to (irregardless whether privileges are granted or denied
on the actual indices). The same will apply for aliases that refer
to data streams. See for more details:
https://github.com/elastic/elasticsearch/issues/66163#issuecomment-824709767
Relates to #66163
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
We always clean up the list of datastreams depending on the indices acutally
in the snapshot. Even for non-partial snapshots datastream indices could be
excluded from the snapshot by index exlusions that remove all the datastream's
indices from the snapshot.
We recently replaced some usages of DocumentMapper with MappingLookup in the search layer, as document mapper is mutable which can cause issues. In order to do that, MappingLookup grew and became quite similar to DocumentMapper in what it does and holds.
In many cases it makes sense to use MappingLookup instead of DocumentMapper, and we may even be able to remove DocumentMapper entirely in favour of MappingLookup in the long run.
This commit replaces some of its straight-forward usages.
If `index.mapping.ignore_malformed` has been set to `true` then
here is no way to overwrite that to `false` for a data stream's
timestamp field.
Before this commit, validation would fail that disallow the usage
of `ignore_malformed` attribute on a data stream's timestamp field.
This commit allows the usage of `ignore_malformed` attribute,
so that `index.mapping.ignore_malformed` can be disabled for a
data stream's timestamp field. The `ignore_malformed` attribute
can only be set to false.
This allows the following index template:
```
PUT /_index_template/filebeat
{
"index_patterns": [
"filebeat-*"
],
"template": {
"settings": {
"index": {
"mapping.ignore_malformed": true
}
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"ignore_malformed": false
}
}
}
},
"data_stream": {}
}
```
Closes#71755
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
We constantly need to look up from index name to `IndexId`, might as well
just have this field as a map to simplify some code here and there and save
a few cycles during cluster state updates for snapshots of large index counts.
We think that this PR, along with #70373, has exposed some legitimate bugs
that we need to address. They're currently causing test failures that are hard
to scope and mute. For now we'll revert this commit, with the plan to
reintroduce it along with the bug fixes.
Relates to #72033, #72031.
This commit adds support for system data streams and also the first use
of a system data stream with the fleet action results data stream. A
system data stream is one that is used to store system data that users
should not interact with directly. Elasticsearch will manage these data
streams. REST API access is available for external system data streams
so that other stack components can store system data within a system
data stream. System data streams will not use the system index read and
write threadpools.
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
Prior to this commit when attempting to close a data stream a validation error is returned indicating that it is forbidden to close a write index of a data stream. The idea behind that is to ensure that a data stream always can accept writes. For the same reason deleting a write index is not allowed (the write index can only be deleted when deleting the entire data stream).
However closing an index isn't as destructive as deleting an index (an open index request makes the write index available again) and there are other cases where a data stream can't accept writes. For example when primary shards of the write index are not available. So the original reasoning for not allowing to close a write index isn't that strong.
On top of this is that this also avoids certain administrative operations from being performed. For example restoring a snapshot containing data streams that already exist in the cluster (in place restore).
Closes#70903#70861
This commit introduces system index types that will be used to
differentiate behavior. Previously system indices were all treated the
same regardless of whether they belonged to Elasticsearch, a stack
component, or one of our solutions. Upon further discussion and
analysis this decision was not in the best interest of the various
teams and instead a new type of system index was needed. These system
indices will be referred to as external system indices. Within external
system indices, an option exists for these indices to be managed by
Elasticsearch or to be managed by the external product.
In order to represent this within Elasticsearch, each system index will
have a type and this type will be used to control behavior.
Closes#67383
Increasing the number of threads to be used for force-merging does not automatically give you any parallelism, even if
you have many shards per node, as force-merge requests are split into node-level subrequests (see
TransportBroadcastByNodeAction, superclass of TransportForceMergeAction), one for each node, and these
subrequests are then executing sequentially for all the shards on that node.
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow
Test artifact resolution is now variant aware which allows us a more adequate
compile and runtime classpath for the consuming projects.
We also Introduce a convention method in the elasticsearch build to declare
test artifact dependencies in an easy way close to how its done by the gradle build in
test fixture plugin.
Furthermore we cleaned up some inconsistent test dependencies declarations when
relying on a project and on its test artifacts
The response to an `IndicesSegmentsAction` might be large, perhaps 10s
of MBs of JSON, and today it is serialized on a transport thread. It
also might take so long to respond that the client times out, resulting
in the work needed to compute the response being wasted.
This commit introduces the `DispatchingRestToXContentListener` which
dispatches the work of serializing an `XContent` response to a
non-transport thread, and also makes `TransportBroadcastByNodeAction`
sensitive to the cancellability of its tasks.
It uses these two features to make the `RestIndicesSegmentsAction`
serialize its response on a `MANAGEMENT` thread, and to abort its work
more promptly if the client's channel is closed before the response is
sent.
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
This has been deprecated in gradle before but we havnt been warned.
Gradle 7.0 will likely introduce a change in behaviour here that we
should fix the usage of this configuration upfront.
See https://github.com/gradle/gradle/issues/16027 for further information
about the change in Gradle 7.0
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.