All the flaky failures started occurring after:
https://github.com/elastic/elasticsearch/pull/99689
This indicates to me that all these tests need re-working due to segment
concurrency. In an effort to get coverage back for our testing,
concurrency is disabled again so these tests can be unmuted.
closes: https://github.com/elastic/elasticsearch/issues/99929
I verified that the test passes, running 100s of times locally. I added
trace logging just in case.
Currently the rest.suppressed logger in RestResponse logs the request path and
the request parameters for 500 errors (warn level) and 400s (debug). In order to
be able to filter on those status codes more efficiently we should add them to
the log message.
Today, we have a hierarchy of tasks in ESQL designed to leverage the
task framework for reporting status and cancellation.
```mermaid
flowchart
RESTLayer -->| EsqlQueryRequest indices:data/read/esql | ComputeService
ComputeService -->| DriverRequest indices:data/read/esql/compute | Driver
ComputeService -->| DataNodeRequest indices:data/read/esql/data | DataNode
DataNode -->| DriverRequest indices:data/read/esql/compute | Driver
Driver -->| LookupRequest indices:data/read/esql/lookup | EnrichLookupService
```
The primary issue here is that `DriverRequest` is neither
`IndicesRequest` nor `CompositeIndicesRequest`. Consequently, the Driver
is executed within the context of the system user, leading to access
indices with the system user.
To address this issue, this PR makes `DriverRequest` a
`CompositeIndicesRequest` and ensures that the Driver executes within
the user's context. With this fix we can now properly capture the
response headers when a Driver is yielded and rescheduled.
Relates #100707
Relates #99646
Relates #99926Closes#100164
Adjust DateHistogram's consumeBucketsAndMaybeBreak to be iteratively during reduce instead accounting all buckets at the end of the reduce.
In case of many non-empty buckets accounting the number of buckets at the end of the reduce may be too late. Elasticsearch may already have failed with an OOME. This change changes the accounting to happen iteratively during the reduce for non-empty bucket.
Note that for empty buckets accounting of the number of buckets already happens iteratively.
This PR makes sure that the start message ("deleting snapshots ...") is
logged after the cluster state is processed and any failures before
finishing updating the new repositoryData are always logged at warning
level.
Resolves: #99057Resolves: #100481
Got asked why we have these classes again last night ... figured I'd
remove some of them to get us going here.
None of these classes are necessary, we can just inline all of these
away and make `ActionType` itself final or a record now that the action
type consists only in name and reader. See #97721 that made these things
redundant I think.
Fixes https://github.com/elastic/elasticsearch/issues/99570
Add support for DATE_PERIOD and TYPE_DURATION values as input
parameters, eg.
```
{
"query": "row a = 1 | eval x = now() + ?",
"params": [{"type":"time_duration", "value":"5 hours"}]
}
```
The values have to be passed as strings and then will be converted to
the appropriate type.
The original issue also pointed to similar problems for Version type, so
the PR also includes a test for this case.
To solve an issue in serverless where multiple trained model allocations
are assigned to a single node, this PR introduces the concept of "fake"
availability zones, where each node is treated as being in its own
availability zone.
Added some logic to GeoTileUtils#toBoundingBox that make sure the generated bounding boxes
are consistent with the arithmetic solution and hence we can use both approaches indistintable.
8.10.4 includes a partial mitigation for the snapshots downgrades bug
introduced in 8.10.0. This commit adds known-issue docs for 8.10.4, and
adjusts the known-issue docs for earlier 8.10.x issues.
The `WaitForSnapshotStep` used to check if the SLM policy has been
executed after the index has entered the delete phase, but it did not
check if the SLM policy included this index.
The result of this is that if the user used an SLM policy that did not
include this index, when the index would enter the
`WaitForSnapshotStep`, it would wait for a snapshot to be taken, a
snapshot that would not include the index, and then ILM would delete the
index.
See the exact reproduction path:
https://github.com/elastic/elasticsearch/issues/57809
**Solution** This PR, after it finds a successful SLM run, it verifies
if the snapshot taken by SLM contains this index. If not it throws an
error, otherwise it proceeds.
ILM explain will report:
```
"step_info": {
"type": "illegal_state_exception",
"reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'"
}
```
**Backwards compatibility concerns** In this PR, the
`WaitForSnapshotStep` changed from `ClusterStateWaitStep` to
`AsyncWaitStep`. We do not think this is gonna cause an issue. This was
tested manually by the following steps: - Run a master node with the old
version. - When ILM is executing `wait-for-snapshot`, we shutdown the
node - We start the node again with the new version os ES - ES was able
to pick up the step and continue with the new code.
We believe that this covers bwc concerns.
Fixes: https://github.com/elastic/elasticsearch/issues/57809
The MappedActionFilter was added to make filtering more efficient, but
the api filtering action filter was not yet using it. This commit
adjusts this shared action filter to implement MappedActionFilter.
When a cluster state has been applied and right after the node becomes a candidate, there's a small race condition where the the thread scheduling can lead to closing the PeerFinder while the node is a Candidate.
Closes#99023
* Break out 'Limitations' into separate page
* Add REST API docs
* Restructure commands, functions, and operators refs
* Add placeholder for getting started guide
* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'
* Add placeholder for Kibana page
* Add link from landing page
* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs
* Reword default LIMIT
* Add support for COUNT(*)
* Move 'Commands' and 'Functions and operators' to individual pages
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Painless sandboxes some errors from Java for which it can recover. These
errors are wrapped within a ScriptException. However, retaining the
error as a cause can be confusing when walking the error chain. This
commit wraps the error so that the real error type does not appear,
but maintains the same error message in xcontent serialized form.
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
Recovery stats may contain additional entries, e.g. in case shards get
relocated. When restoring an index from its snapshot, it suffices to
check that the index is health and searchable.
Fixes#98746
This PR is migrating some of the ITs that use either the
`elasticsearch.legacy-java-rest-test` or the
`elasticsearch.legacy-yaml-rest-test` gradle test plugins to the new
`elasticsearch.internal-java-rest-test` and
`elasticsearch.internal-yaml-rest-test` equivalents. This is the list of
the affected ITs: * SamlAuthenticationIT * OperatorPrivilegesIT *
ProfileIT * SetSecurityUserProcessorWithWithSecurityDisabledIT *
AsyncSearchSecurityIT * SecurityRealmSmokeTestCase *
KibanaSystemIndexIT * KerberosAuthenticationIT * ReindexWithSecurityIT
and ReindexWithSecurityClientYamlTestSuiteIT *
ReloadSecureSettingsWithPasswordProtectedKeystoreRestIT * PermissionsIT
from slm:qa:with-security * Permissions IT from
runtime-fields:with-security * Permissions IT from ilm:qa:with-securiy
* GraphWithSecurityIT and GraphWithSecurityInsufficientRoleIT
Related: ES-6751