This commit adds the ability to specify exclusion patterns in Auto-Follow patterns. This allows excluding indices that match any of the inclusion patterns and also match some of the exclusion patterns giving more fine grained control in scenarios where this is important.
Related #67686
We changed the default joda behaviour in strict_date_optional_time to
max 4 digits in a year. Java.time implementation should behave the same way.
At the same time date_optional_time should have 9digits for year part.
closes#52396closes#72191
The feature branch contains changes to configure PyTorch models with a
TrainedModelConfig and defines a format to store the binary models.
The _start and _stop deployment actions control the model lifecycle
and the model can be directly evaluated with the _infer endpoint.
2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.
The feature branch consists of these PRs: #73523, #72218, #71679#71323, #71035, #71177, #70713
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.
The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.
It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.
To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.
If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.
Closeselastic/ml-cpp#1724
This moves the public build api and plugins into a separete included build called 'build-tools'
and we removed the duplication of included buildSrc twice (2nd import as build-tools).
The elasticsearch internal build logic is kept in build-tools-internal as included build which allows us better handling of this project that its just being an buildSrc project (e.g. we can reference tasks directly from the root build etc.)
Convention logic applied to both projects will live in a new build-conventions project.
Instead of assertion that store stats has a non zero value,
just verify that a value is returned. Verifying whether a
non value or specific value is returned, isn't the purpose
of hlrc integration tests.
Closes#60461
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.
closes#66555closes#67214
Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com>
This commit adds a `cancelled` flag to each cancellable task in the
response to the list tasks API, allowing users to see that a task has
been properly cancelled and will complete as soon as possible.
Closes#72907
Implements a V7 compatible typed endpoints for REST for search related apis
retrofits the REST layer change removed in #41640
relates main meta issue #51816
relates types removal issue #54160
Enroll node API can be used by new nodes in order to join an
existing cluster that has security features enabled. The response
of a call to this API contains all the necessary information that
the new node requires in order to configure itself and bootstrap
trust with the existing cluster.
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.
closes#66555closes#67214
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
Previously, the ResetFeatureStateStatus object captured its status in a
String, which meant that if we wanted to know if something succeeded or
failed, we'd have to parse information out of the string. This isn't a
good way of doing things.
I've introduced a SUCCESS/FAILURE enum for status constants, and added a
check for failures in the transport action. We return a 207 if some but not all
reset actions fail, and for every failure, we also return information about the
exception or error that caused it.
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
This commit adds support for system data streams and also the first use
of a system data stream with the fleet action results data stream. A
system data stream is one that is used to store system data that users
should not interact with directly. Elasticsearch will manage these data
streams. REST API access is available for external system data streams
so that other stack components can store system data within a system
data stream. System data streams will not use the system index read and
write threadpools.
In #71701 we added a new REST API that provides statistics
about the searchable snapshots cache on Frozen Tier.
This commit adds the necessary plumbing to expose this API
in the High Level REST Client. It also exposes the documentation
of the Mount Snapshot API that was created in #68949 but not
made accessible.
The main changes are:
* Set the require_alias parameter in the reindex, index and update methods in RequestConverters class.
* Test that the parameter can be parsed correctly.
Closes#67819.
This change adds two new reserved roles in Elasticsearch, viewer
and editor
Solutions will use these roles in order to provide,out of the box,
a role for full access to data and configuration with no access to
cluster management(editor) and a role for read only access to data
again with no access to cluster management (viewer).
This change enables GeoIP downloader by default.
It removes feature flag but adds flag that is used by tests to disable it again (as we don't want to hammer GeoIP database service with every test cluster we spin up).
Relates to #68920
Now that we have a feature reset API, we should use
this for cleaning up in between tests instead of running
lots of bespoke cleanup code.
During testing of this change we found we need to
delete custom cluster state as part of the reset process,
so this PR also implements that.
Additionally we no longer assign persistent tasks
during feature reset.
In the high level rest client tests, we should clean up trained models, same as we do for datafeeds, jobs, etc.
But, to delete a trained model, it needs to not be used in a pipeline, so we grab the model stats.
This commit does:
- waiting for no more initializing shards when ML integration tests are cleaning up before attempting to delete things
- The specific inference stats test now waits for the `.ml-stats-00001` index to exist before continuing
- And we make sure that config index is green before attempting to run the `pipeline_inference.yml` tests
closes https://github.com/elastic/elasticsearch/issues/71139
previously created pipelines referencing ML models were not being appropriately deleted in upstream tests.
This commit ensures that machine learning removes relevant pipelines from cluster state after tests complete
closes#71072
This change exposes for each field in the _field_caps response if the field is a metadata field.
This is needed for consumers of this API that want to filter these fields. Currently ML keeps a static list
and QL checks that the family type starts with `_`. In order to ease the addition of new metadata fields, this
change reworks the strategy in this solution and now only checks for the new flag.
Note that the new flag is also applied at the coordinator level in a best-effort to apply the logic on older nodes
in a mixed-version cluster.
This PR adds metadata support for API keys. Metadata are of type
Map<String, Object> and can be optionally provided at API key creation time.
It is returned as part of GetApiKey response. It is also stored as part of
the authentication object to transfer throw the wire.
Note that it is not yet searchable and not exposed to any ingest processors.
They will be handled by separate PRs.
Previously, a datafeed and job must already exist for the `_preview` API to work.
With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them.
closes https://github.com/elastic/elasticsearch/issues/70264
Retrying a retryable step when the ILM loop is configured to `1s` (as we do in
the integration tests) is a recipe for flakiness - as ILM keeps moving a failed
step from `ERROR` back into the failed step for it to be retried, and the retry
step API fails if executed when ILM is not in the `ERROR` step.
This makes the documentation integration test more lenient.
This change adds _geoip/stats endpoint that can be used to collect basic data about geoip downloader (successful, failed and skipped downloads, current db count and total time spent downloading).
It also fixes missing/wrong origins for clients that will break if used with security.
Relates to #68920
As we collect points for our auc roc curve from two
different percentiles aggregations, the result may
contain points with equal threshold that are not
monotonic. This is because the percentiles aggregation
is an approximation.
This commit ensures the fina auc roc curve we calculate
is monotonic by collapsing points of equal threshold into
a single point that is the average of the equal threshold
points it represents.
When we disable access to system indices, plugins will still need
a way to erase their state. The obvious and most pressing use
case for this is in tests, which need to be able to clean up the
state of a cluster in between groups of tests.
* Use a HandledTransportAction for reset action
My initial cut used a TransportMasterNodeAction, which requires code
that carefully manipulates cluster state. At least for the first cut and
testing, it seems like it will be much easier to use a client within a
HandledTransportAction, which effectively makes the
TransportResetFeatureStateAction a class that dispatches other transport
actions to do the real work.
* Clean up code by using a GroupedActionListener
* ML feature state cleaner
* Implement Transform feature state reset
* Change _features/reset path to _features/_reset
Out of an abundance of caution, I think the "reset" part of this path
should have a leading underscore, so that if there's ever a reason to
implement "GET _features/<feature_id>" we won't have to worry about
distinguishing "reset" from a feature name.
Co-authored-by: Gordon Brown <gordon.brown@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
improve robustness and ux in case of a missing transform node:
- warn if cluster lacks a transform node in all API's (except DELETE)
- report waiting state in stats if transform waits for assignment
- cancel p-task on stop transform even if config has been deleted
relates #69518
Currently we check several search parameters for illegal values in their
SearchSourceBuilder setters, e.g. negative values throw IAE for: `size`,
`terminateAfter` and `trackTotalHits`.
The validation in the builder setters are used when parsing the above as rest
request parameters, however we currently don't check values when parsing them
from the search request body. This leads to builders with invalid parameters
that sometimes get caucht later (e.g. a negative size is triggering an
IllegalArgumentException in TotalHitCountCollector), but we should validate and
throw errors early.
This PR changes the parsing in SearchSourceBuilder to use the setters, adds
tests and also adds a deprecation for allowing a size parameter of -1, currently
meaning an "unset" value.
Closes#54958
It can be confusing to configure policies with phase timings that get smaller, because phase timings
are absolute. To make things a little clearer, this commit now rejects policies where a configured
min_age is less than a previous phase's min_age.
This validation is added only to the PutLifecycleAction.Request instead of the
TimeseriesLifecycleType class because we cannot do this validation every time a lifecycle is
created or else we will block cluster state from being recoverable for existing clusters that may
have invalid policies.
Resolves#70032
Instead of using the same handler with new latches each time, test now registers completely new handler for every request. This prevents race condition that led to deadlock and timeout.
Local testing shows around 500k iterations without a failure.
Closes#45577
Today every `BulkProcessor` creates two scheduler threads, both called
`[node-name][scheduler][T#1]`, which is also the name of the main
scheduler thread for the node. The duplicated thread names make it
harder to interpret a thread dump.
This commit makes the names of these threads distinct.
Closes#68470
This commit introduces data_content, data_hot, data_warm,
data_cold, and data_frozen to the low level REST client (LLRC).
Since the LLRC only cares about dedicated masters this change
simply makes the LLRC aware of the new roles and does not have
any functional impact. The tests have been adjusted accordingly.
The endpoint `_snapshottable_features` is long and implies incorrect
things about this API - it is used not just for snapshots, but also for
the upcoming reset API. Following discussions on the team, this commit
changes the endpoint to `_features` and removes the connection between
this API and snapshots, as snapshots are not the only use for the output
of this API.
There were still some small places where we were utilizing deprecated options for getting
trained models.
This commit addresses it and utilizes the new API format.
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.
Closes#65056
The currently allowed delta in this tests is to strict to account for slight floating point
differences across different platforms, e.g. ARM.
Closes#68936
This PR expands the meaning of `include_global_state` for snapshots to include system indices. If `include_global_state` is `true` on creation, system indices will be included in the snapshot regardless of the contents of the `indices` field. If `include_global_state` is `true` on restoration, system indices will be restored (if included in the snapshot), regardless of the contents of the `indices` field. Index renaming is not applied to system indices, as system indices rely on their names matching certain patterns. If restored system indices are already present, they are automatically deleted prior to restoration from the snapshot to avoid conflicts.
This behavior can be overridden to an extent by including a new field in the snapshot creation or restoration call, `feature_states`, which contains an array of strings indicating the "feature" for which system indices should be snapshotted or restored. For example, this call will only restore the `watcher` and `security` system indices (in addition to `index_1`):
```
POST /_snapshot/my_repository/snapshot_2/_restore
{
"indices": "index_1",
"include_global_state": true,
"feature_states": ["watcher", "security"]
}
```
If `feature_states` is present, the system indices associated with those features will be snapshotted or restored regardless of the value of `include_global_state`. All system indices can be omitted by providing a special value of `none` (`"feature_states": ["none"]`), or included by omitting the field or explicitly providing an empty array (`"feature_states": []`), similar to the `indices` field.
The list of currently available features can be retrieved via a new "Get Snapshottable Features" API:
```
GET /_snapshottable_features
```
which returns a response of the form:
```
{
"features": [
{
"name": "tasks",
"description": "Manages task results"
},
{
"name": "kibana",
"description": "Manages Kibana configuration and reports"
}
]
}
```
Features currently map one-to-one with `SystemIndexPlugin`s, but this should be considered an implementation detail. The Get Snapshottable Features API and snapshot creation rely upon all relevant plugins being installed on the master node.
Further, the list of feature states included in a given snapshot is exposed by the Get Snapshot API, which now includes a new field, `feature_states`, which contains a list of the feature states and their associated system indices which are included in the snapshot. All system indices in feature states are also included in the `indices` array for backwards compatibility, although explicitly requesting system indices included in a feature state is deprecated. For example, an excerpt from the Get Snapshot API showing `feature_states`:
```
"feature_states": [
{
"feature_name": "tasks",
"indices": [
".tasks"
]
}
],
"indices": [
".tasks",
"test1",
"test2"
]
```
Co-authored-by: William Brafford <william.brafford@elastic.co>
This adds support for "Grant API Key" to the Java High Level Rest
Client.
This API was added in Elasticsearch 7.7 but did not have explicit support
in the HLRC.
Part 9.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
Part 8.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Adds a X-Elastic-Client-Meta header to http requests sent by RestClient. This
header contains information about the runtime environment that is meant to
allow analyzing usage context by collecting this information on the receiving
side of requests, like a proxy server in front of ES.
Using a custom header allows client applications to change the User-Agent
header for their own purpose without losing this information.
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
transforms reports the the last time changes where detected with changes_last_detected_at, however
that doesn't tell a user it searched for changes, this PR adds a field last_search_time to report
when transform searched for changes the last time.
fixes#66410
relates #66367
This adds the multi custom feature processor to data frame analytics and inference.
The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.
Find file structure finder is now its own plugin, and separated from the ml plugin.
This commit updates the rest high level client to reflect this.
Additionally, this adjusts the internal and client object names from `FileStructure` to the more general `TextStructure`
This renames the text structure finder action to match the plugin name.
Also, this adds a new reserved role name so that adding specific permissions for this API is simple.
This introduces a new `text-structure` plugin. This is the new home of the find file structure API.
The old REST URL is still available but is deprecated.
The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.
Changes to the high-level REST client and docs will be in separate commit.
related to: https://github.com/elastic/elasticsearch/issues/67001
VND types like application/vnd.elasticsearch+json
(XContentType.VND_JSON) when compared with
application/json (XContentType.JSON) should use `canonnical`
closes#67064
The delete phase can no longer be defined as empty, it requires at least
one action to be defined. This fixes the test that used an empty collection
of actions for the delete phase to include the delete action.