This commit moves the die with dignity tests to be a test module. The
purpose of this is so the _die_with_dignity endpoint is available in
snapshot builds, for the purpose of enabling testing orchestration logic
that manages what happens to a node after it dies with an
OutOfMemoryError.
Currently we configure per-field postings formats by asking the MapperService
for the MappedFieldType of the field in question, and then checking to see if it
is a completion field. If no MappedFieldType is available, we emit a warning.
However, MappedFieldTypes are for search fields only, and so we end up emitting
warnings for hidden sub-fields that have no corresponding field type, such as
prefix or phrase accelerator fields on text mappers.
This commit reworks things so that the MappingLookup is responsible for defining
per-field postings formats, and will detect CompletionFieldMappers at build time.
All fields that are not mapped to a completion field will just get the default postings
format. This also means that we no longer need a logger instance on CodecService.
Fixes#77183
It is beneficial to sort segments within a datastream's index
by desc order of their max timestamp field, so
that the most recent (in terms of timestamp) segments
will be first.
This allows to speed up sort query on @timestamp desc field,
which is the most common type of query for datastreams,
as we are mostly concerned with the recent data.
This patch addressed this for writable indices.
Segments' sorter is different from index sorting.
An index sorter by itself is concerned about the order of docs
within an individual segment (and not how the segments are organized),
while the segment sorter is only used during search and allows
to start docs collection with the "right" segment,
so we can terminate the collection faster.
This PR adds a property to IndexShard `isDataStreamIndex` that
shows if a shard is a part of datastream.
Today `AbstractRefCounted` has a `name` field which is only used to
construct the exception message when calling `incRef()` after it's been
closed. This isn't really necessary, the stack trace will identify the
reference in question and give loads more useful detail besides. It's
also slightly irksome to have to name every single implementation.
This commit drops the name and the constructor parameter, and also
introduces a handy factory method for use when there's no extra state
needed and you just want to run a method or lambda when all references
are released.
This commit cleans up some cruft left over from older versions of the
`ClusterApplierService`:
- `UpdateTask` doesn't need to implement lots of interfaces and give
access to its internals, it can just pass appropriate arguments to
`runTasks()`.
- No need for the `runOnApplierThread` override with a default priority,
just have callers be explicit about the priority.
- `submitStateUpdateTask` takes a config which never has a timeout, may
as well just pass the priority and remove the dead code
- `SafeClusterApplyListener` doesn't need to be a
`ClusterApplyListener`, may as well just be an `ActionListener<Void>`.
- No implementations of `ClusterApplyListener` care about the source
argument, may as well drop it.
- Adds assertions to prevent `ClusterApplyListener` implementations from
throwing exceptions since we just swallow them.
- No need to override getting the current time in the
`ClusterApplierService`, we can control this from the `ThreadPool`.
The randomization of the repo version often wasn't used because of the repository cache.
Force re-creating the repository every time we manually mess with the versions.
This introduces a basic public yaml rest test plugin that is supposed to be used by external
elasticsearch plugin authors. This is driven by #76215
- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins
Co-authored-by: Mark Vieira <portugee@gmail.com>
I noticed this recently when trying to reproduce a test failure. We're doing a lot of sleeping
when validating that the cluster formed if that process is slow randomly (which it tends to be
due to disk interaction on node starts and such.). By reusing the approach for waiting on a
cluster state we rarely if ever need to get into the busy assert loop and remove all these sleeps,
shaving of a few seconds here and there from running internal cluster tests.
This commit introduces into the node stats API various statistics to
track the time that the elected master spends in various phases of the
cluster state publication process.
Relates #76625
Today we use `ClusterChangedEvent` to represent a committed change to
the cluster state while it's being applied, and also to represent the
proposed change while it's being published. These are quite different
usages in practice, so this commit separates them by introducing a
`ClusterStatePublicationEvent` to represent the change to be published.
Relates #76625 in that we will be able to use the new
`ClusterStatePublicationEvent` to track various stats about the
publication as it progresses, but which don't make sense on a
`ClusterChangedEvent`.
* Remove Node Shutdown API feature flag
This PR removes the Node Shutdown API feature flag.
The Node Shutdown API will now always be available.
* Check if xpack is enabled in cleanup
When I removed the feature flag, I assumed that we would always have the
Node Shutdown APIs, but that turns out not to be the case if xpack isn't
enabled. This case was caught by the logic to handle the case where the
feature flag wasn't enabled by accident.
This commit adds the check we always should have had.
* Also check version before tyring cleanup
* Reformatting to keep Checkstyle after formatting
* Configure spotless everywhere, and disable the tasks if necessary
* Add XContentBuilder helpers, fix test
* Tweaks
* Add a TODO
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This PR makes the delayed allocation infrastructure aware of registered node shutdowns, so that reallocation of shards will be further delayed for nodes which are known to be restarting.
To make this more configurable, the Node Shutdown APIs now support a `allocation_delay` parameter, which defaults to 5 minutes. For example:
```
PUT /_nodes/USpTGYaBSIKbgSUJR2Z9lg/shutdown
{
"type": "restart",
"reason": "Demonstrating how the node shutdown API works",
"allocation_delay": "20m"
}
```
Will cause reallocation of shards assigned to that node to another node to be delayed by 20 minutes. Note that this delay will only be used if it's *longer* than the index-level allocation delay, set via `index.unassigned.node_left.delayed_timeout`.
The `allocation_delay` parameter is only valid for `restart`-type shutdown registrations, and the request will be rejected if it's used with another shutdown type.
Randomly add to use a snapshot for recovery to searchable snapshot and
snapshot tests to verify that recover from snapshot does not break other
features (those should not care about the flag).
Relates #76237
This is related to #73497. Currently, we only use the configured
transport.compression_scheme setting when compressing a request or a
response. Additionally, the cluster.remote.*.compression_scheme
setting is ignored. This commit fixes this behavior by respecting the
per-cluster setting. Additionally, it resolves confusion around inbound
and outbound connections by always responding with the same scheme that
was received. This allows remote connections to have different schemes
than local connections.
This commit adds peer recoveries from snapshots. It allows establishing a replica by downloading file data from a snapshot rather than transferring the data from the primary.
Enabling this feature is done on the repository definition. Repositories having the setting `use_for_peer_recovery=true` will be consulted to find a good snapshot when recovering a shard.
Relates #73496
With recent fixes it is never correct to simply remove a snapshot from the cluster state without
updating other snapshot entries if an entry contains any successful shards due to possible dependencies.
This change reproduces two issues resulting from simply removing snapshot without regard for other queued
operations and fixes them by having all removal of snapshot from the cluster state go through the same
code path.
Also, this change moves the tracking of a snapshot as "ending" up a few lines to fix an assertion about finishing
snapshots that forces them to be in this collection.
This change updates the aggregation script, map script for aggregations, and field scripts to extend
DocBasedScript to give them access to the new fields api.
Currently we compile the regex for each log string that we are
analyzing. This is extremely inefficient and may contribute to
instability seen in the logging IT. This commit compiles the regex a
single time.
This refactors the signature of snapshot finalization. For one it allows removing
a TODO about being dependent on mutable `SnapshotInfo` which was not great but
more importantly this sets up a follow-up where state can be shared between the
cluster state update at the end of finalization and subsequent old-shard-generation
cleanup so that we can resolve another open TODO about leaking shard generation files
in some cases.
We have recently introduced support for grok and dissect to the runtime fields
Painless context that allows to split a field into multiple fields. However, each runtime
field can only emit values for a single field. This commit introduces support for emitting
multiple fields from the same script.
The API call to define a runtime field that emits multiple fields is the following:
```
PUT localhost:9200/logs/_mappings
{
"runtime" : {
"log" : {
"type" : "composite",
"script" : "emit(grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value))",
"fields" : {
"clientip" : {
"type" : "ip"
},
"response" : {
"type" : "long"
}
}
}
}
}
```
The script context for this new field type accepts two emit signatures:
* `emit(String, Object)`
* `emit(Map)`
Sub-fields need to be declared under fields in order to be discoverable through
the field_caps API and accessible through the search API.
The way that it emits multiple fields is by returning multiple MappedFieldTypes
from RuntimeField#asMappedFieldTypes. The sub-fields are instances of the
runtime fields that are already supported, with a little tweak to adapt the script
defined by their parent to an artificial script factory for each of the sub-fields
that makes its corresponding sub-field accessible. This approach allows to reuse
all of the existing runtime fields code for the sub-fields.
The runtime section has been flat so far as it has not supported objects until now.
That stays the same, meaning that runtime fields can have dots in their names.
Because there are though two ways to create the same field with the introduction
of the ability to emit multiple fields, we have to make sure that a runtime field with
a certain name cannot be defined twice, which is why the following mappings are
rejected with the error `Found two runtime fields with same name [log.response]`:
```
PUT localhost:9200/logs/_mappings
{
"runtime" : {
"log.response" : {
"type" : "keyword"
},
"log" : {
"type" : "composite",
"script" : "emit(\"response\", grok(\"%{COMMONAPACHELOG}\").extract(doc[\"message.keyword\"].value)?.response)",
"fields" : {
"response" : {
"type" : "long"
}
}
}
}
}
```
Closes#68203
It's in the title, we were not accounting for relative paths at all
here and only saved by the fact that we mostly short-circuit to
non-streaming writes.
Extended testing to catch this case for S3 and would do a follow-up
to extend it for the other implementations as well.
Today when cancelling a task with its descendants we perform a linear
scan through all the tasks looking for the few that have the right
parent ID. With potentially hundreds of thousands of tasks this takes
quite some time, particularly if there are many tasks to cancel.
This commit introduces a second map that tracks the tasks by their
parent ID so that it's super-cheap to find the descendants that need to
be cancelled.
Closes#75316
This commit adds a new set of classes that would compute a peer
recovery plan, based on source files + target files + available
snapshots. When possible it would try to maximize the number of
files used from a snapshot. It uses repositories with `use_for_peer_recovery`
setting set to true.
It adds a new recovery setting `indices.recovery.use_snapshots`
Relates #73496
This change sets the default value for `xpack.security.enabled` to true
for all licenses. As such the value of the settings is read directly
from the node's settings and not from XPackLicenseState which
doesn't need to keep track of it depending on potential license changes
any more.
Adds minimal fields API support to sort and score scripts.
Example: `field('myfield').getValue(123)` where `123` is the default if the field has no values.
Refs: #61388
We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.
same as #46178 where we had the same bug in recoveries
closes#75686
Today we use Strings for lots of different things when manipulating
snapshots; one crucial such thing is a shard generation. We're not very
consistent about naming the variables containing these things, and have
other kinds of generation in use, so it takes extra effort to track
shard generations through the code. This commit introduces a
`ShardGeneration` class to encapsulate just those strings that are used
as shard generations.
* Flip node shutdown feature flag to default to true on snapshot builds
It previously defaulted to false. The setting can still only be set to 'true' on a
non-release (snapshot) build of Elasticsearch.
Relates to #70338
* Handle case where operator privileges are enabled
synced flush is going to be replaced by flush. This commit allows to synced_flush api only in v7 compatibility mode.
Worth noting - sync_id is gone and won't be available in v7 responses from indices.stats
relates removal pr #50882
relates #51816
The node executing a shard level operation would in many cases communicate `null` for the shard state update,
leading to follow-up operations incorrectly assuming an empty shard snapshot directory and starting from scratch.
closes#75598
Today when a task is cancelled we record the reason for the cancellation
but this information is very rarely exposed to users. This commit
centralises the construction of the `TaskCancellationException` and
includes the reason in the exception message.
Closes#74825
We only create a `ReceiveTimeoutTransportException` in one place, the
timeout handler for the corresponding transport request, so the stack
trace contains no useful information and just adds noise if ever it is
logged. With this commit we drop the stack trace from these exceptions.
In #75454 we changed our dynamic shadowing logic to check that an unmapped
field was truly shadowed by a runtime field before returning no-op mappers. However,
this does not handle the case where the runtime field can have multiple subfields, as
will be true for the upcoming composite field type. We instead need to check that
the field in question would not be shadowed by any field type returned by any
runtime field.
This commit abstracts this logic into a new isShadowed() method on
DocumentParserContext, which uses a set of runtime field type names built from
the mapping lookup at construction time. It also simplifies the no-op mapper
slightly by making it a singleton object, as we don't need to preserve field names
here.
This commit adds a new master transport action TransportGetShardSnapshotAction
that allows getting the last successful snapshot for a particular
shard in a set of repositories. It deals with the different
implementation details around BwC for repositories.
Relates #73496