This introduces a basic public yaml rest test plugin that is supposed to be used by external
elasticsearch plugin authors. This is driven by #76215
- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins
Co-authored-by: Mark Vieira <portugee@gmail.com>
When performing incremental reductions, 0 value of docCountError may mean that
the error was not previously calculated, or that the error was indeed previously
calculated and its value was 0. We end up rejecting true values set to 0 this
way. This may lead to wrong upper bound of error in result. To fix it, this PR
makes docCountError nullable. null values mean that error was not calculated
yet.
Fixes#40005
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Add a dynamic transient cluster setting search.max_async_search_response_size
that controls the maximum allowed size for a stored async search
response. The default max size is 10Mb. An attempt to store
an async search response larger than this size will result in error.
Relates to #67594
Aiming for configuring less during the build,
this removes non required configuration from qa build scripts that do not
contain any sources. We also remove a few non required afterEvaluate hooks
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
This change integrates the circuit breaker in AsyncTaskIndexService to
make sure that we won't hit OOM when serializing a large response of an
async search.
Related to #67594
Supersedes #73638
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
Related to #51816
Makes `Route`s `RestApiVersion` -aware (and `RestHandler`s `RestApiVersion` -agnostic). Refactors
how `Route`s are constructed in the case of deprecation or replacement of routes.
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow
Test artifact resolution is now variant aware which allows us a more adequate
compile and runtime classpath for the consuming projects.
We also Introduce a convention method in the elasticsearch build to declare
test artifact dependencies in an easy way close to how its done by the gradle build in
test fixture plugin.
Furthermore we cleaned up some inconsistent test dependencies declarations when
relying on a project and on its test artifacts
This change adds a new cluster privilege cancel_task that allows to:
Cancel running tasks (_tasks/_cancel).
Cancel and delete async searches.
Today the 'manage' cluster privilege is required to cancel tasks and
to delete async searches when security features are enabled.
This new focused privilege allows to handle tasks and searches only.
The change also adds the privilege to the internal 'kibana_system'
and '_async_search' roles. They both need to be able to cancel tasks
and delete async searches.
Relates #67965
Introduce eql search status API,
that reports the status of eql stored or async search.
GET _eql/search/status/<id>
The API is restricted to the monitoring_user role.
For a running eql search, a response has the following format:
{
"id" : <id>,
"is_running" : true,
"is_partial" : true,
"start_time_in_millis" : 1611690235000,
"expiration_time_in_millis" : 1611690295000
}
For a completed eql search, a response has the following format:
{
"id" : <id>,
"is_running" : false,
"is_partial" : false,
"expiration_time_in_millis" : 1611690295000,
"completion_status" : 200
}
Closes#66955
This change allows users that do not initiated an async search to delete it
if they have the cluster manage and manage-security privilege.
It is equivalent to the cancellation of tasks through the task manager (same privilege required)
and will allow users with the right permissions to cancel/delete async searches if they know
the async execution id.
This has been deprecated in gradle before but we havnt been warned.
Gradle 7.0 will likely introduce a change in behaviour here that we
should fix the usage of this configuration upfront.
See https://github.com/gradle/gradle/issues/16027 for further information
about the change in Gradle 7.0
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
This change removes a test that tries to simulate failures when indexing the response of an async search request.
The test is flaky and doesn't simulate the errors correctly so it was disabled. This change removes it entirely
since it doesn't add any value.
Closes#63948
There can be a race between two GET async search requests, and the one
with a lower keep_alive parameter wins the race. This scenario is not
desirable as we should retain the search result for all requests. This
commit ensures the keep_alive is extended and never goes backward.
Closes#64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.
Closes#64740.
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
For async searches (EQL included) the client's request headers were
erroneously stored in the .tasks index. This might expose the requesting
client's HTTP Authorization header. This PR fixes that by employing the
usual approach to store only the security-internal headers, which carry
the authentication result, instead of the original Authorization header,
which is commonly utilized to redo authentication for scheduled tasks.
This was observed in #65405 due to trying to locally execute a
task whose parent was already cancelled but is a general issue.
We should not throw from APIs that consume a listener as this may
like in this case leak the listener in that case.
Rather than fixing the specific case of #65405 this fixes the
abstract client overall to avoid other such leaks.
Closes#65405
This replaces the `SearchContext` passed to the ctor of `Aggregation`s
with `AggregationContext`. It ends up adding a fairly large number of
methods to `AggregationContext` but in exchange it shows a path to
removing a few methods from `SearchContext`. That seems nice!
It also gives us an accurate inventory of "all of the stuff" that
aggregations use to build and run.
Today search responses do not report failures for shard that were not available
for the search.
So if one shard is not assigned on a search over 5 shards, the
search response will report:
```
"_shards": {
"total": 5,
"successful": 4,
"skipped": 0,
"failed": 0
}
```
If all shards are unassigned, we report a generic search phase exception with no cause.
It's easy to spot that `successful` is less than `total` in the response but not reporting
the failure is misleading for users.
This change removes the special handling of not available shards exception in search responses
and treat them as any other failure that could occur on a shard.
These exceptions will count in the `failed` section and will be reported in details in
the `shard_failures` section.
If all shards are unavailable, the search API will now return 404 NOT_FOUND as an indication
that the search failed because it couldn't find any of the resources.
Closes#47700
A class with 2 fields does not need a builder, especially
when in many cases the builder result is just equivalent to the
`EMPTY` singleton to begin with.
Removed the builder and simplified related code accordingly.
This commit renames the test module used by async-search so that it does
not trip the check on test modules published with snpashot builds. A
more robust approach will be added in the future so this is not a module
at all, but for now this fixes CI from breaking on release tests.
closes#64910
When rest tests install plugins, they currently treat xpack as plugins
instead of modules. This was a side effect of splitting the rest test
plugin out from PluginBuildPlugin. However, that means xpack modules are
being run through the elasticsearch plugin installer, which is not
guaranteed to work. This commit reworks rest tests to treat anything
under xpack as a module. A side effect of this is that QA plugins under
xpack get treated as modules. This is ok because they are just for
testing, we don't need to validate them in the same way we do actual
plugins. Additionally, this commit relaxes the expection that modules
are only added for the integ test distribution. There is already a check
to not overwrite an existing module, so this wasn't a useful
optimization anyways.
Since #63520, we will cancel a search that hits shard failures and does
not accept partial results. However, that change can return the wrong
HTTP code for bad requests (from 4xx to 5xx) due to the cancellation.
Relates #63520Closes#64012Closes#63702
* Async search should retry updates on version conflict
The _async_search APIs can throw version conflict exception when the internal response
is updated concurrently. That can happen if the final response is written while the user
extends the expiration time. That scenario should be rare but it happened in Kibana for
several users so this change ensures that updates are retried at least 5 times. That
should resolve the transient errors for Kibana. This change also preserves the version
conflict exception in case the retry didn't work instead of returning a confusing 404.
This commit also ensures that we don't delete the response if the search was cancelled
internally and not deleted explicitly by the user.
Closes#63213
Before this change we inspected the index when optimizing
`date_histogram` aggregations, precalculating the divisions for the
buckets for the entire range of dates on the index so long as there
aren't a ton of these buckets. This works very well when you query all
of the dates in the index which is quite common - after all, folks
frequently want to query a week of data and have daily indices.
But it doesn't work as well when the index is much larger than the
query. This is quite common when dumping data into ES just to
investigate it but less common in the traditional time series use case.
But even there it still happens, it is just less impactful. Consider
the default query produced by Kibana's Discover app: a range of 15
minutes and a interval of 30 seconds. This optimization saves something
like 3 to 12 nanoseconds per document, so that 15 minutes would have to
have hundreds of millions of documents for it to be impactful.
Anyway, this commit takes the query into account when precalculating the
buckets. Mostly this is good when you have "dirty data". Immagine
loading 80 billion docs in an index to investigate them. Most of them
have dates around 2015 and 2016 but some have dates in 1970 and
others have dates in 2030. These outlier dates are "dirty" "garbage".
Well, without this change a `date_histogram` across many of these docs
is significantly slowed down because we don't precalculate the range due
to the outliers. That's just rude! So this change takes the query into
account.
The bulk of the code change here is plumbing the query into place. It
turns out that its a *ton* of plumbing, so instead of just adding a
`Query` member in hundreds of args replace `QueryShardContext` with a
new `AggregationContext` which does two things:
1. Has the top level `Query`.
2. Exposes just the parts of `QueryShardContext` that we actually need
to run aggregation. This lets us simplify a few tests now and will
let us simplify many, many tests later.
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.
Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:
- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`
Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
We need to complete the search before closing the iterator, which
internally closes the point in time; otherwise, the search will fail
with a missing context error.
Closes#62451
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.
As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.
Closes#37182