As described in the issue, the change in #96763
has made the MixedClusterClientYamlTestSuiteIT for mget fail very
often. For now, let's take the same approach that we have for get.
Closes#97236
For snapshots builds we automatically enable all feature flags,
but for release builds they need to be explicitly added to
test clusters for tests.
This PR does it for synonyms feature.
Closes#96641, #97177
A number of aggregations don't support counter fields,
because its computation doesn't make sense on these fields.
For example computing an average on a counter doesn't make
sense.
Relates to #93539
`GET _cat/allocation` is a useful way to get a high-level view of the
balance of a cluster, but clusters are only balanced within each data
tier and today this API does not expose node roles. This commit adds an
optional `node.role` column to this API.
Added additional fields to SearchProfileResults for XContent output: node_id, cluster, index, shard_id.
It parses the existing composite ID using the new parseProfileShardId method, which reverses
the SeachShardTarget.toString method.
No new information is added here, merely the splitting out of the four pieces of information
in the profile shards "composite" id that is created by the SeachShardTarget.toString method.
Profile/shards output now has the form:
```
"profile": {
"shards": [
{
"id": "[2m7SW9oIRrirdrwirM1mwQ][blogs][0]",
"node_id": "2m7SW9oIRrirdrwirM1mwQ",
"shard_id": "0",
"index": "blogs",
"cluster": "(local)",
"searches": [ ... ]
...
},
{
"id": "[UngEVXTBQL-7w5j_tftGAQ][remote1:blogs][2]",
"node_id": "UngEVXTBQL-7w5j_tftGAQ",
"shard_id": "2",
"index": "blogs",
"cluster": "remote1",
"searches": [ ... ]
...
```
where the latter is on a remote cluster and you can see that as the prefix on the index name.
Partially addresses #25896
Added yamlRestTest for the new fields in the profile response.
This PR adds a new optional parameter "resource" for ReloadAnalyzersRequest.
If used, only analyzers that use this specific "resource" will be reload.
This parameter is not documented, for internal use only.
PR #96886 introduced auto-reload of analyzers on synonyms index change. The problem
was that reloading was applied broadly for all indices that contained reloadable
analyzers. This PR improves this, so when a particular synonyms set changes,
only analyzers that use this synonyms set will auto-reloaded. Note that shard
requests will still be sent to all indices shards, as only on a shard we can
decide if analyzers need to be reloaded.
Synonym Management API project
On changes of synonyms in a synonym set, auto-reload analyzers.
Note that currently all updateable analyzers will be reloaded, even
those that are not relevant for a synonyms set being updated.
* WIP Started geo_line for TSDB work
Starting with YAML tests (which currently pass) and AggregatorTests
(currently failing, likely due to mistake in the tests)
* Update docs/changelog/94954.yaml
* WIP Refactoring to prepare for TSDB geo_line
* Created TimeSeries version of GeoLineAggregator, and wired it in so that time-series aggregations use it, but current behavior is still identical to non-time-series.
* Added both yaml and unit tests for testing that geo_line works with correct results in both time-series and non-time-series cases.
* Added additional tests to verify the grouping behaviour of time-series vs. terms aggs, and the combination of the two.
* WIP Refactoring to prepare for TSDB geo_line
* Started refactoring to re-use simplifier for all buckets
* Fixed bug with leaf collector not changing per segment
* Fixed bug with leaf collector not detecting bucket changes
The bucket id can change within a segment, so we need to detect this and save the geo_line.
* Renamed class since it no longer extends BucketedSort
The original geo_line relied on the BucketedSort for all intelligence.
The time-series geo_line uses none of that, and does its own memory management.
* Fixed bug with geo_point leaking between geo_line buckets
And enhanced unit tests to cover multiple groups
* Code review updates
* Verify that the sort field is specifically the TS timestamp
Only activate the time-series optimizations if the aggregation is both:
* Within a time-series aggregation (ie. tsid and @timestamp ordered)
* The geo_line sort field is @timestamp
* Allow geo_point time-series to skip sort config
Also disables the new geo_line for time-series even if the correct
sort and point fields are used if the point field is not explicitly
configured to be a position metric.
* Support geo_centroid and geo_bounds on position metric
* Update yaml tests for multi-terms tests
* Changed to disallow alternative sort-fields in ts-geo_line
Since the primary criteria for switching to the new algorithm is that
geo_line is within a time-series aggregation, we now disallow any other sort field.
We test the negative case in the yaml tests, but changed the unit tests to
use TermsAggregation to minim the time-series aggregation to get comparable
results.
* For non-time-series check missing sort field early
The old code only threw error if there was data because the check was done
inside the leaf collector just before actually reading the sort field.
And there were no tests for missing sort field.
This commit adds the tests, and checks early so even if data is missing.
* Reviewed TODOs
* Test that behaviour is identical with or without POSITION metric
* Removed fallback code in builder (was switching to old geo_line without POSITION metric)
* Removed two TODO's that are no longer valid concerns
* Add repo throttle metrics to node stats api response
* Update docs/changelog/96678.yaml
* Change x-content output structure
* Fix test after merge from main
* Follow PR comments
* minor fixes
* minor fixes 2
* Introduce new TransportVersion (V_8_500_010)
* Fix yaml test
* Follow PR comments
* Make stats datapoints human readable
* Follow common pattern for human readable output
* Bump up TransportVersion
Add a new target (`script`) to the `/_info` API. It consolidates all the script information from the cluster nodes and returns a summary at the cluster level (compared with `_nodes/stats/script` it lacks the `<node>` dimension).
PR #95895 implemented PUT request for adding synonyms, but left validation
for later. This adds simple validation for synonyms that is not
attached to specific analyzer: only checking that the format
of a synonym rule is correct.
Sometimes a segment only contains tombstone documents. In that case, loading min and max @timestamp field values can result into NPE. Because these documents don't have a @timestamp field.
This change fixes that by checking for the existence of the @timestamp field in the a segment's field infos.
Add a new target (thread_pool) to the /_info API. It consolidates all the thread pools information from the cluster nodes and returns a summary at the cluster level (compared with _nodes/stats/thread_pool it lacks the <node> dimension)
Adding a new endpoint under `_info/http`. This endpoint summarises the HTTP info of all the nodes into one big response, at cluster level. Compared with `_nodes/stats`, it lacks the nodes dimension.
Include search idle info to shard stats (#95740)
Extend the Index Stats API to include information about shard idleness.
To be more precise we include two pieces of info:
* a boolean indicating if a shard is idle at the moment the API call takes place
* a long value indicating the time in milliseconds the shard has been search idle
since the last time it went into search idle state
This reverts commit f523caf44d.
#95740 introduced a transport protocol change but using the old stack
version transport versions. This breaks inter-commit backward
compatibility. This pull request reverts this for now until we can go
back and reintroduce the change using a new transport version.
Extend the Index Stats API to include information about shard idleness.
To be more precise we include two pieces of info:
* a boolean indicating if a shard is idle at the moment the API call takes place
* a long value indicating the time in milliseconds the shard has been search idle
since the last time it went into search idle state
we want to allow overriding info (GET /) api in serverless, therefore this commit moves the RestMainAction and is transport classes into a module that has a rest plugin
Main endpoint is often used in testing to verfiy that a cluster is ready, hence this commit also has to add a testing dependency on main to a lot of modules
relates #95422
The tests in `13_fields.yml` file are taking a very long time in any test cluster with a single node.
This is because the create index api call in the test setup waits for all shards to be allocated.
Also for the replica shards and that never happens in a single node cluster.
The create index api waits for this to happen for 30 seconds.
This seems to be added a while ago when dealing with mixed cluster rest test failures.
No shards seem to be available in that past causing test failures.
But the subsequent cluster health api should be sufficient to ensure that at least one copy per shard is allocated.
This change also changes the number of shards from 3 to 2 (2 shards is sufficient for this test) and
bounds to cluster health api call only to test1 index.
Closes#95580
Fix the following mistakes in the rest-api specification for the new
endpoints.
- Changed the stability of the endpoints from stable to experimental.
- Removed the following paths that we decided we do not support.
Relates to: #93596
* Allow multiple field names/patterns for (path_)(un)match (#66364)
Arrays of patterns are now allowed for dynamic_templates in the match,
unmatch, path_match and path_unmatch fields. DynamicTemplate has been modified to
support List<String> for these fields. The patterns can be either simple wildcards
or regex. As with previous functionality, when match_pattern="regex", simple wildcards
will be flagged with an error, but when match_pattern="simple", using regular expressions
in the match will not throw an error.
One new error pathway was added: if a user specifies a list of non-strings for
one of these pattern fields (e.g., "match": [10, false]) a MapperParserException
will be thrown.
A dynamic_template yamlRestTest was added. This is a BWC change, so the REST test
that uses arrays of patterns is limited to v8.9 and above.
Closes#66364.
This change adds:
* Total global ordinal build time for all fields and per field.
* Max shard value count per field. The value count is per shard and of the shard with the highest count. Reporting value on index level or across indices is too expensive to report or keep track of.
This is added to common stats, which
is exposed in several stats APIs.
The following api call:
```
GET /_nodes/stats?filter_path=nodes.*.indices.fielddata&fields=key,key2
```
Returns:
```
{
"nodes": {
"pcMNy4GsQ8ef6Rw-bI2EFg": {
"indices": {
"fielddata": {
"memory_size_in_bytes": 2552,
"evictions": 0,
"fields": {
"key2": {
"memory_size_in_bytes": 1320
},
"key": {
"memory_size_in_bytes": 1232
}
},
"global_ordinals": {
"build_time_in_millis": 8,
"fields": {
"key2": {
"build_time_in_millis": 4,
"shard_max_value_count": 4
},
"key": {
"build_time_in_millis": 4,
"shard_max_value_count": 4
}
}
}
}
}
}
}
}
```
This PR is similar to #46586.
When waiting for no initializing shards we also have to wait for events
when we have more than one node in the cluster. When the primary is
started, there is a short period of time, where neither the primary nor
any of the replicas are initializing.
Normally dimension fields are identified by means of a boolean parameter
at mapping time, time_series_dimension. Flattened fields do not have mappings,
other than identifying the top level field as a flattened field type. Moreover a boolean
is not enough to identify the top-level field as a dimension since we would like
users to be able to specify a subset of the fields in the flattened field to be dimensions
(not necessarily all of them). For this reason we introduce a new mapping parameter,
time_series_dimensions, which lists the fields, in any order, in the flattened field
that the user wants as dimensions. Field names must not include the root field name
and their name is the relative path from the root down to the leaf field name.
We require flattened fields to be indexed, to have doc values and disallow usage
of the ignore_above parameter together with time_series_dimensions.
This introduces an endpoint to reset the desired balance.
It could be used if computed balance diverged from the actual one a lot
to start a new computation from the current state.
The `.watches` index is a system index, which means that its settings
cannot be modified by the user. This commit adds APIs (`PUT
/_watcher/settings` and `GET /_watcher/settings`) that allow modifying
and retrieving a subset of index settings for the `.watches` index.
The settings that are currently allowed are `index.number_of_replicas`
and `index.auto_expand_replicas`, though more may be added in the
future.
Resolves https://github.com/elastic/elasticsearch/issues/92991
This PR enables downloading packaged models from `ml-models.elastic.co`,
an endpoint provided by Elastic. Elastic provided models begin with a
`.`, which is a private namespace that does not interfere with user
models(the `.` prefix is disallowed for them). If a user puts a packaged
model, the model gets automatically downloaded. For air-gaped
environments it is possible to load models from a file.
earlier changes: #95175, #95207
A trained model deployment can be started with an optional deployment Id.
Deployment Ids and model Ids considered to be in the same namespace
and unique, a deployment id cannot be the same as any other deployment
or model Id unless it is the same as the model being deployed. When
creating a new model, the id cannot match any models or deployments
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:
* flattened fields use sorted set doc values of keywords, which means two things:
first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
have no way to distinguish single-valued from multi-values fields other then looking
at the count of values
This change sets the stability of ent-search APIs to beta and visibility to public.
It also removes the feature flag link since enabling the module is not considered as a feature flag
and the module is enabled by default.
With PR we introduce CRUD endpoints which update/delete the data lifecycle on the data stream level. When this is updated it will apply at the next DLM run to all the backing indices that are managed by DLM.
Document parsing methods currently throw MapperParsingException. This
isn't very helpful, as it doesn't contain any information about where the parse
error happened - it is designed for parsing mappings, which are realised into
java maps before being examined. This commit introduces a new exception
specifically for document parsing that extends XContentException, so that
it reports the current position of the parser as part of its error message.
Fixes#85083
This adds a new parameter to `knn` that allows filtering nearest neighbor results that are outside a given similarity.
`num_candidates` and `k` are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search `num_candidates` and only keep those that are within the provided `similarity` boundary, and then finally reduce to only the global top `k` as normal.
For example, when using the `l2_norm` indexed similarity value, this could be considered a `radius` post-filter on `knn`.
relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574
This should help us ensure that desired balance is not producing too many shard movements during computation (that could be a sign of unusual configuration or a bug) that could eventually result in actual cluster balance diverging far from the desired balance (separate change is still required to warn/reset if we are in fact far during reconciliation step).
This change adds a new rest parameter called `rest_include_named_queries_score` that when set, includes the score of the named queries that matched the document.
Note that with this change, the score of named queries is always returned when using the transport client. The rest level has the ability to set the format of
the matched_queries section for BWC (kept as is by default).
Closes#65563