Fleet server needs an API to access up to date global checkpoints for
indices. Additionally, it requires a mode of operation when fleet can
provide its current knowledge about the global checkpoints and poll for
advancements. This commit introduces this API in the fleet plugin.
This fixes the `global` aggregator when `profile` is enabled. It does so
by removing all of the special case handling for `global` aggs in
`AggregationPhase` and having the global aggregator itself perform the
scoped collection using the same trick that we use in filter-by-filter
mode of the `filters` aggregation.
Closes#71098
Drops an assertion that we can't be sure will always pass. If we're
unlucky all documents with `_doc_count` can end up on a single shard and
our assertion won't pass. In yaml we don't have the ability to assert
that *either* shard has `_doc_count`. It's ok! We have an assertion for
this in another place too.
Close#71088
This commit reenables the BWC tests after the ML roles were migrated to
server. During the course of that work, the BWC tests were disabled
pending that work being backported to 7.x. Now that that work is not
going to be backported to 7.x, instead we apply some permanent
transformations to the 7.x assertions run against ES in the REST
compatibility tests.
This commit moves the machine learning roles to server. We no longer
need to maintain these roles outside of server since we only produce a
single distribution, the default distribution, which includes all
roles. Therefore we can simplify the plugin architecture by removing the
plugin extension point for roles. This is one step in that, by moving
the machine learning roles to server.
This commit moves the data tier roles to server. It is no longer
necessary to separate these roles from server as we no longer build
distributions that would not contain these roles. Moving these roles
will simplify many things. This is deliberately the smallest possible
commit that moves these roles. Other aspects related to the data tiers
can move in separate, also small, commits.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
Previously docvalue_fields for binary values with paddings did not
output padding. We consider it to be a bug because: 1) es would
not be able parse these values 2) output from source filtering
and fields API is different and does output padding.
This patches fixes this by outputing padding for binary
docvalue_fields where it is present.
This change adds _geoip/stats endpoint that can be used to collect basic data about geoip downloader (successful, failed and skipped downloads, current db count and total time spent downloading).
It also fixes missing/wrong origins for clients that will break if used with security.
Relates to #68920
The types removal effort has removed the type from Index API in #47671 and from Get API in #46587
This commit allows to use 'typed' endpoints for the both Index and Get APIs
relates compatible types-removal meta issue #54160
When the `terms` agg is at the top level it can run as a `filters` agg
instead because that is typically faster. This was added in #68871 and
we mistakely made it so that a bucket without any hits could take up a
slot on the way back to the coordinating node. You could trigger this by
having a fairly precise `size` on the terms agg and a top level filter.
This fixes the issue by properly mimicing the regular terms aggregator
in the "as filters" version: only send back buckets without any matching
documents if the min_doc_count is 0.
Closes#70449
When we disable access to system indices, plugins will still need
a way to erase their state. The obvious and most pressing use
case for this is in tests, which need to be able to clean up the
state of a cluster in between groups of tests.
* Use a HandledTransportAction for reset action
My initial cut used a TransportMasterNodeAction, which requires code
that carefully manipulates cluster state. At least for the first cut and
testing, it seems like it will be much easier to use a client within a
HandledTransportAction, which effectively makes the
TransportResetFeatureStateAction a class that dispatches other transport
actions to do the real work.
* Clean up code by using a GroupedActionListener
* ML feature state cleaner
* Implement Transform feature state reset
* Change _features/reset path to _features/_reset
Out of an abundance of caution, I think the "reset" part of this path
should have a leading underscore, so that if there's ever a reason to
implement "GET _features/<feature_id>" we won't have to worry about
distinguishing "reset" from a feature name.
Co-authored-by: Gordon Brown <gordon.brown@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
If a search after request targets multiple indices and some of its sort
field has type `date` in one index but `date_nanos` in other indices,
then Elasticsearch won't interpret the search_after parameter correctly
in every target index. The sort value of a date field by default is a
long of milliseconds since the epoch while a date_nanos field is a long
of nanoseconds.
This commit introduces the `format` parameter in the sort field so a
sort value of a date or date_nanos will be formatted using a date format
in a search response.
The below example illustrates how to use this new parameter.
```js
{
"query": {
"match_all": {}
},
"sort": [
{
"timestamp": {
"order": "asc",
"format": "strict_date_optional_time_nanos"
}
}
]
}
```
```js
{
"query": {
"match_all": {}
},
"sort": [
{
"timestamp": {
"order": "asc",
"format": "strict_date_optional_time_nanos"
}
}
],
"search_after": [
"2015-01-01T12:10:30.123456789Z" // in `strict_date_optional_time_nanos` format
]
}
```
Closes#69192
Add support to delete component templates api to specify multiple template
names separated by a comma.
Change the cleanup template logic for rest tests to remove all component templates via a single delete component template request. This to optimize the cleanup logic. After each rest test we delete all templates. So deleting templates this via a single api call (and thus single cluster state update) saves a lot of time considering the number of rest tests.
Older versions don't support component / composable index templates
and/or data streams. Yet the test base class tries to remove objects
after each test, which adds a significant number of lines to the
log files (which slows the tests down). The ESRestTestCase will
now check whether all nodes have a specific version and then decide
whether data streams and component / composable index templates will
be deleted.
Also ensured that the logstash-index-template and security-index-template
aren't deleted between tests, these templates are builtin templates that
ES will install if missing. So if tests remove these templates between tests
then ES will add these template back almost immediately. These causes
many log lines and a lot of cluster state updates, which slow tests down.
Relates to #69973
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Now that the PR to move flattened to core was backported, we can adjust the
skip version in REST tests. We can also remove FlattenedFeatureSetUsage, since
it is only necessary to communicate with pre-7.13 nodes.
We used to treat setting size to -1 in search request bodies or as a rest
parameter as a no-op, using the default search size of 10 in this case. This
lenient behaviour was deprecated in #69548 and is removed with this PR in 8.0.
Relates to #69548
Currently we check several search parameters for illegal values in their
SearchSourceBuilder setters, e.g. negative values throw IAE for: `size`,
`terminateAfter` and `trackTotalHits`.
The validation in the builder setters are used when parsing the above as rest
request parameters, however we currently don't check values when parsing them
from the search request body. This leads to builders with invalid parameters
that sometimes get caucht later (e.g. a negative size is triggering an
IllegalArgumentException in TotalHitCountCollector), but we should validate and
throw errors early.
This PR changes the parsing in SearchSourceBuilder to use the setters, adds
tests and also adds a deprecation for allowing a size parameter of -1, currently
meaning an "unset" value.
Closes#54958
The test failing in #69985 does so because under rare circumstance the result
order for match_all can be different. If we want to make assertions on specific
entries in the result, we should sort by a field that imposes a fixed result
ordering.
Closes#69985
A #68808 introduced a possibility to declare fields which will be only available to parsing when a compatible API was used.
This commit replaces deprecated log with compatible logging when a 'compatible only' field was used. Also includes a refactoring of LoggingDeprecationHandler method names
relates #51816
This field mapper only lived in its own module so it could be licensed as x-pack
basic. Now it can be moved to core, which matches its status as a core type.
Runtime fields telemetry has been entirely moved to be part of cluster stats API in 7.x and master. This commit removes the backwards compatibility layer that was needed before such change was backported.
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
also have sub-aggregations. Locally I see 2 second searches run in 1.2
seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
it feels less impressive because the point range queries are a little
slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
don't have a `parent` aggregation or collect "other" buckets. Also
save a ton of memory while collecting them.
This commit adds date match support to aliases to the put alias, update aliases and create index APIs.
For example:
```
PUT %3Clogs-myapp-%7Bnow%2Fd%2B1d%7D-0%3E
POST logs-myapp-2021.03.03-0/_alias/%3Clogs-myapp-%7Bnow%2B1d%7D%3E
```
Or via a single api call:
```
PUT %3Clogs-myapp-%7Bnow%2Fd%2B1d%7D-0%3E
{
"aliases": {
'<logs-myapp-{now+1d}> ': {}
}
}
```
Closes#20367
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
The endpoint `_snapshottable_features` is long and implies incorrect
things about this API - it is used not just for snapshots, but also for
the upcoming reset API. Following discussions on the team, this commit
changes the endpoint to `_features` and removes the connection between
this API and snapshots, as snapshots are not the only use for the output
of this API.
This commit changes part of a regular expression for some tests to
be more performant. While it is difficult articulate why this is change is much
faster, testing has shown for some inputs this match to be less then 1s,
where prior could take over 30s.
related: #69757
This commit adds support for two new REST test features.
warnings_regex and allowed_warnings_regex.
This is a near mirror of the warnings and allowed_warnings
warnings feature where the test can be instructed to allow
or require HTTP warnings. The difference with these new features
is that is allows the match to be based on a regular expression.
Adds support for the include_unloaded_segments flag in node stats, which helps with understanding resource usage of
shared_cache-style searchable snapshots on a per-node basis.
This speeds up the `terms` agg in a very specific case:
1. It has no child aggregations
2. It has no parent aggregations
3. There are no deleted documents
4. You are not using document level security
5. There is no top level query
6. The field has global ordinals
7. There are less than one thousand distinct terms
That is a lot of restirctions! But the speed up pretty substantial because
in those cases we can serve the entire aggregation using metadata that
lucene precomputes while it builds the index. In a real rally track we
have we get a 92% speed improvement, but the index isn't *that* big:
```
| 90th percentile service time | keyword-terms-low-cardinality | 446.031 | 36.7677 | -409.263 | ms |
```
In a rally track with a larger index I ran some tests by hand and the
aggregation went from 2200ms to 8ms.
Even though there are 7 restrictions on this, I expect it to come into
play enough to matter. Restriction 6 just means you are aggregating on
a `keyword` field. Or an `ip`. And its fairly common for `keyword`s to
have less than a thousand distinct values. Certainly not everywhere, but
some places.
I expect "cold tier" indices are very very likely not to have deleted
documents at all. And the optimization works segment by segment - so
it'll save some time on each segment without deleted documents. But more
time if the entire index doesn't have any.
The optimization builds on #68871 which translates `terms` aggregations
against low cardinality fields with global ordinals into a `filters`
aggregation. This teaches the `filters` aggregation to recognize when
it can get its results from the index metadata. Rather, it creates the
infrastructure to make that fairly simple and applies it in the case of
the queries generated by the terms aggregation.
Forces a test to use only a single shard so the assertion about the
aggregation profiler results are correct. Without this the test fails
randomly but very rarely. We have to use multiple shards (seeded random
10% choice) and we have to land all of the documents on one shard
(unseeded random 3.2%) and that shard has to be the second shard in the
list (unseeded random 50%). That works out to about 1.6% chance with an
appropriate seed - .16% without it.
Currently when a `token_count` field is defined inside a nested field, we get an
NPE because the underlying DocValueFetcher needs its formattedDocValues to be
loaded and the SourceLookup it sees needs to have a valid docId other than -1.
This change fixes those issues so the whole fields request doesn't error.
However this change doesn't solve the missing support for doc values lookup
under nested fields described in 68983. Fortunately `token_count` seems to be the only
mapping type currently affected.
Relates to #68983
This fixed "filter by filter" execution order so it doesn't ignore
`doc_count`. The "filter by filter" execution is fairly performance
sensitive but when I reran performance numbers everything looked fine.
Currently, the value fetcher framework handles ignored fields by reading
the stored values of the _ignored metadata field, and passing these through
on calls to fetchValues(). However, this means that if a document has multiple
values indexed for a field, and one malformed value, then the fields API will
ignore everything, including the valid values, and return an empty list for this
document.
If a document source contains a malformed value, then it must have been
ignored at index time. Therefore, we can safely assume that if we get an
exception parsing values from source at fetch time, they were also ignored
at index time and they can be skipped. This commit moves this exception
handling directly into SourceValueFetcher and ArraySourceValueFetcher,
removing the need to inspect the _ignored metadata and fixing the case
of mixed valid and invalid values.
Add a `max_analyzed_offset` query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
`index.highlight.max_analyzed_offset`, thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.
Closes: #52155
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
This change adds tests around the handling of mixed object and dot notation in
document source when using the `fields` API with nested fields left out
of #67432. After merging #68540, this test can now be added.
Relates to #67432
This partially reverts #64016 and and adds #67839 and adds
additional tests that would have caught issues with the changes
in #64016. It's mostly Nik's code, I am just cleaning things up
a bit.
Co-authored-by: Nik Everett <nik9000@gmail.com>
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.
The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).
Relates to #60848
Types are no longer allowed in requests in 8.0, so we can remove support for
using the `_type` field within a search request.
Relates to #41059.
Closes#68311.
At the moment, the ‘fields’ API handles nested fields the same way I handles non-nested object arrays: it just returns them in a flat list. However, the relationship between nested fields is something we should try to preserve, since this is the main purpose of mapping something as “nested” instead of just using an object.
This PR changes this by returning grouped field values that are inside a nested object according to the nested object they initially appear in. Any further object structures inside a nested object are again returned as a flattened list. Fields inside nested fields don’t appear in the flattened response outside of the nested path any more. The grouping of fields inside nested objects is applied recursively if nested mappings are defined inside another nested mapping.
Closes#63709
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Since the "*,-*" pattern resolves to "no indices", it makes a normally
destructive action into a non-destructive one. Rather than throwing a
wildcards-not-allowed exception, we can allow this pattern to pass
without triggering an exception. This allows the security layer to
safely use a "*,-*" pattern to indicate a "no indices" result for its
index resolution step, which is important because otherwise we get
wildcards-not-allowed exceptions when trying to delete nonexistent
concrete indices. For simplicity, we require exactly "*,-*", rather than
any other wildcards that might be logically equivalent.
This commit mostly reverts #67934, except for the change to the version
constant `REPOSITORY_UUID_IN_REPO_DATA_VERSION`.
Completes the backport of #67829 via #67899
This commit suppresses any BWC tests related to snapshots in `master` so
that #67899 can be merged to `7.x`. It will mostly be reverted after the
merge of #67899 is complete.
Relates #66431
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.
This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.
This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.
Part of the fixes for #66419, this commit permits nodes to emit the
deprecation warning regarding not specifying `?wait_for_active_shards`
when closing an index in 7.x versions for x ≥ 12. This change is
required on `master` too since the BWC tests encounter these warnings.
Relates #67246, which is the 7.x part of this change.
* Adds a minimum version request parameter to SearchRequest.
The minimum version helps failing a request if any shards
involved in the search do not meet the compatibility requirements
(all shards need to have a version equal or later than the minimum
version provided).
In 7.x the close indices API defaulted to `?wait_for_active_shards=0`
but from 8.0 it defaults to respecting the index settings instead. This
commit introduces the `index-setting` value for this parameter on this
API allowing users to opt-in to the future behaviour today, and emits a
deprecation warning indicating that the default no longer needs to be
used and will be unsupported in future.
In 7.x a follow up PR will introduce support for the same
`index-setting` value for this parameter and will emit deprecation
warnings if users try and use the default instead.
Relates #66419
When I merged #67043 it had an integration test for the thing it was
fixing but it still fails in the bwc tests. Yikes! I should know better
but life is life. Anyway, this updates the skip to ignore the test for
now. I'll reenable once the backport is in.
Fixes a bug where nested documents that match a filter in the `filters`
agg will be counted as matching the filter. Usually nested documents
only match if you explicitly ask to match them. Worse, we only mach them
in the "filter by filter" mode that we wrote to speed up date_histogram.
The `filters` agg is fairly rare, but with #63643 we run
`date_histogram` and `range` aggregations using `filters.