previously removed in #38514, deprecated in 7.x and defaulted to true.
With rest api compatibility and when value is true it is ignored and warning is emitted
when value is false, an exception is thrown
relates #51816
the adjust_pure_negative value used to be always present in to_xcontent
response, but it was changed in #49543 to only serialise this field when
the value is false
relates #51816
indices upgrade api (/_upgrade or /{index}/_upgrade) was removed and _reindex is suggested to be used instead.
There is no easy way to translate _upgrade request to _reindex requests. The dummy Upgrade action will return an exception to a user with a message indicating that _reindex should be used.
upgrade api removal #64732
relates #51816
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.
Closes#65740.
this PR removes tests which are not meant to be fixed (ml/, vectors/) to a separate "not to be fixed list" so that we can see which compatible changes are meant to be implemented.
relates #51816
Changes:
* Adds a tutorial for search templates.
* Adds reference docs for the render search template API.
* Improves parameter documentation for the multi search template API.
* Removes duplicate examples from the search template API, multi search API, and create stored script API docs.
* Splits the source files for the search template API and the multi search template API docs.
This adds support for a `dry_run` parameter for the
`_ilm/migrate_to_data_tiers` API. This defaults to `false`, but when
configured to `true` it will simulate the migration of elasticsearch
entities to data tiers based routing, returning the entites that need to
be updated (indices, ILM policies and the legacy index template that'd
be deleted, if any was configured in the request).
With types removal changes being available under rest api compatibility I have removed the block entries for tests which are already fixed
relates #51816
The filter by filter terms aggregation optimization only kicks in when
its targeting a non-empty shard. An empty shard is fast to collect no
matter what so there isn't really any need to do anything complex.
Anyway, this locks the test to a single sharded index so there isn't a
chance of the debugging data coming back from an empty shard. Which
would cause the test to fail because its expecting the optimization to
run.
Closes#74612
This adds the _ilm/migrate_to_data_tiers API to expose the service for
migrating the elasticsearch abstractions (indices, ILM policies and an
optional legacy template to delete) to data tiers routing allocation
(away from custom node attributes)
Previously removed in #46943
parsing type field in term lookup is now possible with rest
compatible api. The type field is ignored
relates main meta issue #51816
relates type removal meta issue #54160
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
This disables the filter-by-filter aggregation optimization used by
`terms`, `range`, `date_histogram`, and `date_range` aggregations unless
we're *sure* that its faster than the "native" implementation. Mostly this
is when the top level query is empty or we can merge it into the filter
generated by the agg rewrite process.
Now that we have hard and fast rules we can drop the cost estimation
framework without too much fear. So we remove it in this change. It
stomps a bunch of complexity. Sadly, without the cost estimation stuff
we have to add a separate mechanism for blocking the optimization
against runtime fields for which it'd be kind of garbage. For that I
added another rule preventing the filter-by-filter aggregation from
running against the queries made by runtime fields. Its not fool-proof,
but we have control over what queries we pass as a filter so its not
wide open.
I spent a lot of time working on an alternative to this that preserved
that fancy filter-by-filter collection mechanism and was much more kind
to the query cache. It detected cases where going full filter-by-filter
was bad and grouped those filters together to collect in one pass with a
funny ORing collector. It *worked*. And, if we were super concerned with
the performance of the `filters` aggregation it'd be the way to go. But
it was very complex and it was actually slower than using the native
aggregation for things like `terms` and `date_histogram`. It was
glorious. But it was wrong for us. Too complex and optimized the wrong
things.
So here we are. Hopefully this is a fairly simple solution to a sneaky
problem.
previously removed in #55622 use_field_mapping option can be used on doc
value format under rest api compatibility.
The value itself is ignored (replaced with null) as it is a default
behaviour to use field mapping format.
relates https://github.com/elastic/elasticsearch/issues/51816
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.
The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
If the after key cannot parse back to the same value we generated it from, when the client sends it back we will land on a wrong page. If this page is earlier, it is likely we will (eventually) generate the same wrong after key, resulting in an infinite loop as the client repeatedly ask to retrieve the same page or pages of data. This change fixes that by failing the composite aggregation (with an exception) if the after key is unparsable with the given format. We provide as much information about what failed as possible.
We made a setting to disable an optimization because, sometimes, it
turns out to be slower. The setting gives users an "escape hatch". In
our assertion for the setting we accidently were too specific about the
aggregator we expected to run so the test would sometimes fail
spuriously. This loosens the assertion. While we're at it, it also adds
an assertion that the optimization is enabled by default.
Closes#74374
the per type indexing stats is simplified and when _types is requested
it will return total stats for the index repeated under types/_doc/
the removal #47203
relates main meta issue #51816
relates types removal issue #54160
Date based aggregations accept a timezone, which gets applied to both the bucketing logic and the formatter. This is usually what you want, but in the case of date formats where a timezone doesn't make any sense, it can create problems. In particular, our formatting logic and our parsing logic were doing different things for epoch_second and epoch_millis formats with time zones. This led to a problem on composite where we'd return an after key for the last bucket that would parse to a time before the last bucket, so instead of correctly returning an empty response to indicate the end of the aggregation, we'd keep returning the same last page of data.
Adds a new API that allows a user to reset
an anomaly detection job.
To use the API do:
```
POST _ml/anomaly_detectors/<job_id>_reset
```
The API removes all data associated to the job.
In particular, it deletes model state, results and stats.
However, job notifications and user annotations are not removed.
Also, the API can be called asynchronously by setting the parameter
`wait_for_completion` to `false` (defaults to `true`). When run
that way the API returns the task id for further monitoring.
In order to prevent the job from opening while it is resetting,
a new job field has been added called `blocked`. It is an object
that contains a `reason` and the `task_id`. `reason` can take
a value from ["delete", "reset", "revert"] as all these
operations should block the job from opening. The `task_id` is also
included in order to allow tracking the task if necessary.
Finally, this commit also sets the `blocked` field when
the revert snapshot API is called as a job should not be opened
while it is reverted to a different model snapshot.
block list at the moment contains a lot of tests that we already identified that won't be fixed.
This commit moves them out of the main block list so that it is easier to count progress.
The feature branch contains changes to configure PyTorch models with a
TrainedModelConfig and defines a format to store the binary models.
The _start and _stop deployment actions control the model lifecycle
and the model can be directly evaluated with the _infer endpoint.
2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.
The feature branch consists of these PRs: #73523, #72218, #71679#71323, #71035, #71177, #70713
Sometimes our fancy "run this agg as a Query" optimizations end up
slower than running the aggregation in the old way. We know that and use
heuristics to dissable the optimization in that case. But it turns out
that the process of running the heuristics itself can be slow, depending
on the query. Worse, changing the heuristics requires an upgrade, which
means waiting. If the heurisics make a terrible choice folks need a
quick way out. This adds such a way: a cluster level setting that
contains a list of queries that are considered "too expensive" to try
and optimize. If the top level query contains any of those queries we'll
disable the "run as Query" optimization.
The default for this settings is wildcard and term-in-set queries, which
is fairly conservative. There are certainly wildcard and term-in-set
queries that the optimization works well with, but there are other queries
of that type that it works very badly with. So we're being careful.
Better, you can modify this setting in a running cluster to disable the
optimization if we find a new type of query that doesn't work well.
Closes#73426
This moves the public build api and plugins into a separete included build called 'build-tools'
and we removed the duplication of included buildSrc twice (2nd import as build-tools).
The elasticsearch internal build logic is kept in build-tools-internal as included build which allows us better handling of this project that its just being an buildSrc project (e.g. we can reference tasks directly from the root build etc.)
Convention logic applied to both projects will live in a new build-conventions project.
The repository analyzer API spec was incorrectly stored in the plugin
directory rather than in the main `rest-api-spec` directory. This commit
fixes that.
This change adds support for using `search_after` with field collapsing. When
using these in conjunction, the same field must be used for both sorting and
field collapsing. This helps keep the behavior simple and predictable.
Otherwise it would be possible for a group to appear on multiple pages of
results.
Currently search after is handled directly in `CollapsingTopDocsCollector`. As
a follow-up, we could generalize the logic and move support to the Lucene
grouping framework.
Closes#53115.
Implements a V7 compatible typed endpoints for REST for search related apis
retrofits the REST layer change removed in #41640
relates main meta issue #51816
relates types removal issue #54160
Enroll node API can be used by new nodes in order to join an
existing cluster that has security features enabled. The response
of a call to this API contains all the necessary information that
the new node requires in order to configure itself and bootstrap
trust with the existing cluster.
Adds some extra debugging information to make it clear that you are
running `significant_text`. Also adds some using timing information
around the `_source` fetch and the `terms` accumulation. This lets you
calculate a third useful timing number: the analysis time. It is
`collect_ns - fetch_ns - accumulation_ns`.
This also adds a half dozen extra REST tests to get a *fairly*
comprehensive set of the operations this supports. It doesn't cover all
of the significance heuristic parsing, but its certainly much better
than what we had.
Implements a V7 compatible typed endpoints for REST put and get mapping endpoints. Also for Get Field Mappings endpoints.
retrofits the REST layer change removed in #41676
relates main meta issue #51816
relates types removal issue #54160
New api designed for use by apps like Kibana for auto-complete use cases.
A search string is supplied which is used as prefix for matching terms found in a given field in the index.
Supported field types are keyword, constant_keyword and flattened.
A timeout can limit the amount of time spent looking for matches (default 1s) and an `index_filter` query can limit indices e.g. those in the hot or warm tier by querying the `_tier` field
Closes#59137
This prevents the `date_histogram` from running out of memory allocating
empty buckets when you set the interval to something tiny like `seconds`
and aggregate over a very wide date range. Without this change we'd
allocate memory very quickly and throw and out of memory error, taking
down the node. With it we instead throw the standard "too many buckets"
error.
Relates to #71758
The test failing in #71685 does so because under rare circumstance the result
order for match_all can be different. If we want to make assertions on specific
entries in the result, we should sort by a field that imposes a fixed result
ordering.
Closes#71685
I broke composite early termination when reworking how aggregations'
contact for `getLeafCollector` around early termination in #70320. We
didn't see it in our tests because we weren't properly emulating the
aggregation collection stage. This fixes early termination by adhering
to the new contract and adds more tests.
Closes#72078
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
This prevents the `histogram` aggregation from allocating tons of empty
buckets when you set the `interval` to something tiny. Instead, we
reject the request. We're not in a place where we can aggregate over
huge ranges with tiny intervals, but we should fail gracefully when you
ask us to do so rather than OOM.
Closes#71744
Currently when the fleet global checkpoints API returns immediately if
the index is not ready or shards are not ready. This commit modifies the
API to wait on the index and primary shards active up until the timeout
period.
Related to #71449.
This commit revives the documentation of the "Clear Cache" and
"Shard Stats" APIs of Searchable Snapshots that was removed
in #62217. This is a partial revert of the commit b545c55 with
some light wording changes.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit adds some per-index statistics to the `SnapshotInfo` blob:
- number of shards
- total size in bytes
- maximum number of segments per shard
It also exposes these statistics in the get snapshot API.
This commit allows to use the include_type_name parameter with the compatible rest api.
The support for include_type_name was previously removed in #48632
relates #51816
types removal meta issue #54160
Adds support for close_to assertion to yaml tests. The assertion can be called
the following way:
```
- close_to: { get.fields._routing: { value: 5.1, error: 0.00001 } }
```
Closes#71303
Currently the `fields` API fetches the root flattened field and returns it in a
structured way in the response. In addition this change makes it possible to
directly query subfields. However, requesting flattened subfields via wildcard
patterns is not possible.
Closes#70605
Since #16661 it is possible to know the total sizes for some Lucene segment files
by using the Node Stats or Indices Stats API with the include_segment_file_sizes
parameter, and the list of file extensions has been extended in #71416.
This commit adds a bit more information about file sizes like the number of files
(count), the min, max and average file sizes in bytes that share the same extension.
Here is a sample:
"cfs" : {
"description" : "Compound Files",
"size_in_bytes" : 2260,
"min_size_in_bytes" : 2260,
"max_size_in_bytes" : 2260,
"average_size_in_bytes" : 2260,
"count" : 1
}
This commit also simplifies how compound file sizes were computed: before
compound segment files were extracted and sizes aggregated with regular
non-compound files sizes (which can be confusing and out of the scope of
the original issue #6728), now CFS/CFE files appears as distinct files.
These new information are provided to give a better view of the segment
files and are useful in many cases, specially with frozen searchable snapshots
whose segment stats can now be introspected thanks to the
include_unloaded_segments parameter.
Revamps the integration tests for the `filter` agg to be more clear and
builds integration tests for the `fitlers` agg. Both of these
integration tests are fairly basic but they do assert that the aggs
work.
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.
This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.
Addresses #41106.
Fleet server needs an API to access up to date global checkpoints for
indices. Additionally, it requires a mode of operation when fleet can
provide its current knowledge about the global checkpoints and poll for
advancements. This commit introduces this API in the fleet plugin.
This fixes the `global` aggregator when `profile` is enabled. It does so
by removing all of the special case handling for `global` aggs in
`AggregationPhase` and having the global aggregator itself perform the
scoped collection using the same trick that we use in filter-by-filter
mode of the `filters` aggregation.
Closes#71098