Commit Graph

72833 Commits

Author SHA1 Message Date
David Kyle f52d5fab7c
[ML] Rename the ELSER service (#100944) 2023-10-17 09:58:38 +01:00
David Turner 596c3cbd4b
TransportNodesAction impls are local-only (#100867)
There are no remote invocations of any actions derived from
`TransportNodesAction` so there is no need to register the top-level
action with the `TransportService`, and that means that all the code
related to de/serialization of the top-level request and response is
unused and can be removed.

Relates #100111 Relates #100878
2023-10-17 04:27:37 -04:00
David Turner b0469e90bc
Throttle per-index snapshot deletes (#100793)
Each per-index process during snapshot deletion takes some nonzero
amount of working memory to hold the relevant snapshot IDs and metadata
generations etc. which we can keep under tighter limits and release
sooner if we limit the number of per-index processes running
concurrently. That's what this commit does.
2023-10-17 04:08:25 -04:00
Ievgen Degtiarenko 14952cd7ea
Fix collision in field names (#100940)
Emmited metrics could not be index as elasticsearch.metrics.s3.exceptions field
is both long counter and a parent object for a histogram. This change renames
histogram to avoid the conflict.
2023-10-17 09:40:41 +02:00
David Kyle 3a44943a78
[CI] Mute Failing tests in HeapAttackIT (#100900)
For #100678 and and #100640
2023-10-17 09:14:44 +02:00
Ignacio Vera 3cd700073c
Add tolerance to ExtendedStatsAggregatorTests#testSummationAccuracy (#100917) 2023-10-17 08:08:04 +02:00
David Turner 27e8e8050b
Tidy up BlobStoreRepositoryTests (#100924)
There's no need for the `fslike` repository, the thread-name check it
exists to suppress permits execution on test threads so does not need
suppressing. This commit replaces it with a regular `fs` repository and
cleans up a couple of other nits.
2023-10-17 06:45:49 +01:00
Costin Leau 2418f8abab
ESQL: Preserve intermediate aggregation output in local relation (#100866)
Data nodes can fold a plan (typically for missing fields) to an empty,
local relationship as a logical optimization. However the context, such
as whether the output is an aggregation or not gets lost which is
problematic during physical execution since the upstream aggregation
expects the intermediate states while the aggregation returns the final
ones.

Consider the query: from index | where field is not null | stats c =
count()

On shards where the field in the filter does not exist, the filter gets
nullified which folds the whole _local_ plan to a LocalRelation
returning c as 0. However the data node should return the intermediate
aggregation states (count and seen) - otherwise the query fails with an
internal error (NPE) since the expected channel by the exchange is not
found.

Fix #100807
2023-10-17 01:30:03 -04:00
Ryan Ernst 32c50dc058
Separate version qualifier from version in build (#100868)
The build version is made up of a few parts in non-release builds. Both
the snapshot and pre-release qualifiers are appended to it. These
qualifiers used to be part of Version, but in 7.0 the qualifiers were
made to be found only in the build info. The Build class retains these
qualifiers through the compile ES version extracted from the server jar
at runtime.

Build.qualifiedVersion() is suppose to provide the fully qualified
version, including snapshot and pre-release qualifiers. Yet
Build.version() also includes this information; there is no distinction
since the qualifier was moved to be only in the build info.

This commit separates the pre-release qualifier from the version. It
maintains bwc in talking to older nodes, passing the fully qualified
version there, but in current nodes splits out the pre-release qualifier
into a new member of Build.
2023-10-16 20:23:03 -07:00
Nhat Nguyen 6699a3194b
Perform enrich lookup with enrich_origin (#100856)
Direct access to the .enrich-* indices, which are restricted system 
indices, should not be granted to users. Instead, ESQL enrich lookup
should access these indices using the enrich_origin on behalf of the
user. With this change, the enrich lookup checks for the monitor_enrich
cluster privilege before performing the actual lookup with the
enrich_origin.

Spin-off from #100724
2023-10-16 16:39:33 -07:00
Costin Leau 5da8c6cc92
QL: Preserve subfields for invalid types (#100875)
In certain scenarios, a field can be mapped both as a primitive and
 object, causing it to be marked as unsupported, losing any potential
 subfields that might have been discovered before.
This commit preserve them to avoid subfields from being incorrectly
 reported as missing.

Fix #100869
2023-10-16 15:46:31 -07:00
Benjamin Trent 93583813f3
Refresh indicies before checking disk usage (#100845)
* Refresh indicies before checking disk usage

* switch from refresh to forceMerge
2023-10-16 17:35:54 -04:00
Nhat Nguyen cf8a6be77f
Ensure document order in enrich yaml test (#100863)
The test fails due to out-of-order documents in the enrich index. This 
can occur when replicas are initializing during indexing. To avoid this,
we just need to ensure there are no initializing shards before starting
indexing and disable shard relocations.

Closes #99807
2023-10-16 12:50:32 -07:00
Jonathan Buttner e26ad8f9d2
[ML] Adding request queuing for http requests (#100674)
* Tests are really slow

* Closing services

* Cleaning up code

* Fixing spotless

* Adding some logging for evictor thread

* Using a custom method for sending requests in the queue

* Adding timeout and rejection logic

* Fixing merge failure

* Revert "Adding timeout and rejection logic"

This reverts commit acc8ba0c0b.

* Removing rethrow

* Reverting node.java changes

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-10-16 14:15:02 -04:00
Albert Zaharovits 7df7776a72
Done (#100919)
It appears that some freshly generated tokens fail authn under
concurrency conditions. This change increases verbosity of the
TokenService logging in order to track down how exactly is the token not
good for authn.

Related: https://github.com/elastic/elasticsearch/issues/85697
2023-10-16 13:47:12 -04:00
David Roberts 1846414946
[ML] Reduce chance of timeout in serverless ML autoscaling (#100910)
If ML serverless autoscaling fails to return a response within
the configured timeout period then the control plane autoscaler
will log an error. Too many of these errors will raise an alert,
therefore as much as possible should be done on the ML side to
_not_ time out.

Previously there were two possible causes of timeouts:

1. If a request for node stats from all ML nodes timed out
2. If a request to refresh the ML memory tracker timed out

The first case can happen if a node leaves the cluster at a bad
time and the message sent to it gets lost. The second case can
happen if searching the ML results indices for model size stats
documents is slow.

We can avoid timeouts in these two situations as follows:

1. There was no need to use the API to get the only value from
   the node stats that the autoscaler needs to know - the total
   amount of memory on each ML node is stored in a node attribute
   on startup so exists in cluster state
2. When we refresh the ML memory tracker we can just return stats
   that instruct the autoscaler to do nothing until the refresh
   is complete - this is functionally the same as timing out each
   request, but without generating error messages
2023-10-16 17:48:52 +01:00
Keith Massey 8e0fd222a9
Sending an index name to DocumentParsingObserver that is not ever null (#100862) 2023-10-16 11:02:42 -05:00
David Turner 8c0994e5f8
Use assertNoFailureListener in more places (#100892) 2023-10-16 16:46:00 +01:00
Dianna Hohensee 9084a63d5e
Name and comment improvements for IndexRecoveryIT.java (#100912) 2023-10-16 11:28:27 -04:00
Kostas Krikellas 30ac8feb8f
Skip cat tsdb test for versions 8.7-8-10 (#100914)
Yet another test affected by the fix for showing the synthetic source,
#98808. This can trigger an assert in older versions as the mapping they
produce (without synthetic source) doesn't match the one they may get
from the master, if the latter is in version 8.10+.

Fixes #100913
2023-10-16 11:00:30 -04:00
Nhat Nguyen e4ea68a104
Register TopN status in plugin's writables (#100874)
```
"node_failures": [
  {
    "type": "failed_node_exception",
    "reason": "Failed node [qpdSPb3yQkuDlsI9TH7a2g]",
    "node_id": "qpdSPb3yQkuDlsI9TH7a2g",
    "caused_by": {
      "type": "transport_serialization_exception",
      "reason": "Failed to deserialize response from handler",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown NamedWriteable [org.elasticsearch.compute.operator.Operator$Status][topn]"
      }
    }
  }
]
```

I hit this error when trying to retrieve ESQL tasks. The issue is that we forget
 to register NamedWritable for the status of TopN.
2023-10-16 06:31:21 -07:00
Ignacio Vera 12f249ed8a
Set replica to 1 in 130_geo_shape_runtime.yaml (#100906)
Set number of replicas to 1 so the test can run against serverless.
2023-10-16 09:19:14 -04:00
Armin Braun 04f37dfcc2
Use pooled Netty allocator in tests by default (#100877)
We mostly run our tests with less than 1G of heap per JVM. This means that
we will use the unpooled Netty allocator in most tests, losing us a lot of
leak coverage in internal cluster tests (mostly for inbound buffers).
Unless otherwise specified by tests, we should force the use of our standard
allocator by default to get a higher chance of catching leaks in internalClusterTests
in particular.
2023-10-16 15:04:33 +02:00
Alan Woodward 31736fc754
Don't hold onto ClusterState reference in AbstractSearchAsyncAction (#100901)
If the cluster state is changing quickly while searches are starting
then these  captured cluster states can consume substantial memory, and
we are only  interested in two values here.  This commit extracts the
two relevant values in the constructor, removing the cluster state
references entirely.

Closes #100120
2023-10-16 08:20:38 -04:00
Lorenzo Dematté a60a7890d0
Revert "Making yaml tests version selector parser compatible with versions returned by Build (#100794)" (#100889)
This reverts commit 5fe7e03248.
2023-10-16 13:28:47 +02:00
Ignacio Vera a82f0ac7b0
Add runtime field of type geo_shape (#100492)
This commit adds the possibility to create runtime fields of type geo-shape. In order to create them, users can 
define an emit function that takes either a geojson object or a WKT string that internally creates a geometry object.
2023-10-16 13:14:28 +02:00
Kostas Krikellas 3247accddf
[TEST] Assert that both time-series indexes are created (#100885)
* Assert that both time-series indexes are created

* Exclude from 8.7-8.10 mixedClusterTests

* Restore asserts

* Fix assert
2023-10-16 13:10:47 +03:00
David Kyle 83abb37f54
[ML] Use correct writable name for model assignment metadata in mixed cluster (#100886)
Older nodes will fail if they do not recognise the named writable
2023-10-16 11:10:04 +01:00
David Turner d01c61fbbe
Better failure logging in testFailsIfRegisterHoldsSpuriousValue (#100888)
Relates #99422
2023-10-16 06:07:41 -04:00
Kostas Krikellas 76b9d9591e Merge remote-tracking branch 'upstream/main' 2023-10-16 12:24:00 +03:00
Kostas Krikellas fe9995965f Revert "Assert that both time-series indexes are created"
This reverts commit baff9ae361.
2023-10-16 12:23:27 +03:00
Alan Woodward edab22a31c
Consistent scores for multi-term SourceConfirmedTestQuery (#100846)
SourceConfirmedTestQuery uses a QueryVisitor to collect terms from
its inner query to build its internal SimScorer. It is important to hold these
terms in a consistent order so that when scores for each term are summed,
the order of summation is the same as it would be for the inner query. This
commit changes the call to visit to use a LinkedHashSet to ensure that
terms are iterated in the order in which they are collected.

Fixes #98712
2023-10-16 10:11:10 +01:00
David Roberts 43a4167528
[ML] Check for internal index searchability as well as active primary (#100852)
Currently, before performing operations that require the ML internal
indices be available we check whether their primary shards are active.

In stateless Elasticsearch we need to separately check whether the
indices are searchable, as search and indexing shards are separate.
2023-10-16 10:02:30 +01:00
David Roberts 3ccbb001e8
[Transform] Check for internal index searchability as well as active primary (#100851)
Currently, before performing operations that require the transform
internal index be available we check whether its primary shard is
active.

In stateless Elasticsearch we need to separately check whether the
index is searchable, as search and indexing shards are separate.
2023-10-16 09:34:58 +01:00
Kostas Krikellas 30c09f64d4 Merge remote-tracking branch 'upstream/main' 2023-10-16 11:17:00 +03:00
Kostas Krikellas baff9ae361 Assert that both time-series indexes are created 2023-10-16 11:16:53 +03:00
Lorenzo Dematté 5fe7e03248
Making yaml tests version selector parser compatible with versions returned by Build (#100794) 2023-10-16 09:34:16 +02:00
Julia Bardi 42cf90f67c
using all privileges (#100764) 2023-10-16 09:30:50 +02:00
David Turner cb184639d2
Execute local action via client in RemoteClusterNodesAction (#100876)
Rather than sending a nodes-info request to the local node via its
transport service, we should use the `Client` to invoke the action
directly.
2023-10-16 06:05:24 +01:00
Yang Wang 65b4d594ae
Push s3 requests count via metrics API (#100383)
This PR builds on top of #100464 to publish s3 request count via the metrics API.
The metric takes the name of `repositories.requests.count` with 
attributes/dimensions of 
`{"repo_type": "s3", "repo_name": "xxx", "operation": "xxx", "purpose": "xxx"}`.

Closes: ES-6801
2023-10-16 10:01:26 +11:00
David Turner ec819e4a23
Relax cleanup check in SnapshotStressTestsIT (#100855)
We can't assert no leaked blobs here because today the first cleanup
leaves the original `RepositoryData` in place so the second cleanup is
not a no-op.

Relates #100718
2023-10-15 16:50:42 +01:00
Ignacio Vera c00b626dfb
Add test that proves you can write a ByteReference using its iterator (#100703) 2023-10-14 13:04:36 +02:00
Brian Seeders 17ef0af4f8
[buildkite] Upload build artifact and add to build scan (#100842) 2023-10-13 16:35:32 -04:00
Dianna Hohensee 323d9366df
Stabilize testRerouteRecovery throttle testing (#100788)
Refactor testRerouteRecovery, pulling out testing of shard recovery
throttling into separate targeted tests. Now there are two additional
tests, one testing source node throttling, and another testing target
node throttling. Throttling both nodes at once leads to primarily the
source node registering throttling, while the target node mostly has
no cause to instigate throttling.
2023-10-13 15:45:26 -04:00
Mark Vieira b8a204f428
Avoid eagerly creating spotless task in esql:compute project (#100789) 2023-10-13 09:59:03 -07:00
Jake Landis 1eaa907052
Fix manage/monitor_enrich documentation (#100781)
manage_enrich is a cluster privilege, not a built in role. 
manage_enrich is already documented as a cluster privilege.
This commit remove manage_enrich from the role documentation.
This commit also makes mention of the monitor_enrich introduced in #99646.

related: #85877
2023-10-13 11:29:48 -05:00
Athena Brown be136c8f57
Fix NullPointerException in RotableSecret (#100779)
This commit fixes two things:
1) RotatableSecret#matches could throw a NullPointerException when the current secret is null but the prior secret is not.
2) RotatableSecret#checkExpired would not expire a prior secret when checking the same millisecond the prior secret was due to expire.

Both of these would cause intermittent test failures, the first based on randomization, the second based on timing.
2023-10-13 10:25:01 -06:00
David Roberts 4d55f37427
[Transform] Consider task cancelled exceptions as recoverable (#100828)
A task cancelled exception has REST status 400, which makes it
irrecoverable as far as transforms is concerned. This means that
a transform that suffers such an exception will fail without
doing any retries. This is bad, because a search can fail with
a task cancelled exception if one of its lower level phases
suffers a circuit breaker exception. We want transforms to retry
in the event of there temporarily not being sufficient memory
for a search.
2023-10-13 17:06:09 +01:00
David Kyle 2ce5392ebd
[ML] Extra logging for debugging rolling upgrade test failure #100800
For investigating #100371
2023-10-13 15:42:10 +01:00
gheorghepucea cb30096c65
Referenced the svgs of starts_with and trim in asciidoc for consistency. (#100834) 2023-10-13 16:01:47 +02:00