Commit Graph

9490 Commits

Author SHA1 Message Date
David Turner 0bf31b77fb
Fix message for stalled shutdown (#89254)
Today if a node shutdown is stalled due to unmoveable shards then we say
to use the allocation explain API to find details. In fact, since #78727
we include the allocation explanation in the response already so we
should tell users just to look at that instead. This commit adjusts the
message to address this.
2022-08-11 07:48:03 +01:00
Mary Gouseti 399a8ac283
Add TransportHealthNodeAction (#89127) 2022-08-10 17:04:22 +02:00
David Turner ceffaf9aad
Improve rejection of ambiguous voting config name (#89239)
Today if there are multiple nodes with the same name then
`POST /_cluster/voting_config_exclusions?node_names=ambiguous-name` will
return a `500 Internal Server Error` and a mysterious message. This
commit changes the behaviour to throw an `IllegalArgumentException`
(i.e. `400 Bad Request`) along with a more useful message describing the
problem.
2022-08-10 12:39:24 +01:00
Ievgen Degtiarenko 72e24d38bb
Log when repository is marked as corrupted (#89132) 2022-08-10 08:12:00 +02:00
Nikola Grcevski 895baf011c
Delete invalid settings for system indices (#88903) 2022-08-09 17:11:55 -04:00
Stuart Tettemer 264f09f3d5
Script: Common base class for write scripts (#89141)
Adds `WriteScript` as the common base class for the write scripts: `IngestScript`, `UpdateScript`, `UpdateByQueryScript` and `ReindexScript`.

This pulls the common `getCtx()` and `metadata()` methods into the base class and prepares for the implementation of the ingest fields api (https://github.com/elastic/elasticsearch/issues/79155).

As part of the refactor, `IngestScript` now takes a `CtxMap` directly rather than taking "sourceAndMetadata" (`CtxMap`) and `Metadata` (from `CtxMap`).  There is a new `getCtxMap()` getter to get the typed `CtxMap`.  `getSourceAndMetadata` could have been refactored to do this, but most of the callers of that don't need to know about `CtxMap` and are happy with a `Map<String, Object>`.
2022-08-09 12:31:18 -05:00
David Turner de281b5072
Complete listener in ReservedStateErrorTaskExecutor (#89191) 2022-08-10 01:10:38 +09:30
Keith Massey e63bcb550e
Fixing internal action names (#89182)
Fixing the names of the internal actions used by CoordinationDiagnosticsService to begin with "internal:" so
that they can be used in the system context with security enabled.
2022-08-09 08:47:29 -05:00
Armin Braun c6c05bb625
Deduplicate ShardRouting instances when building ClusterInfo (#89190)
The equality checks on these in `DiskThresholdDecider` become very expensive
during reroute in a large cluster. Deduplicating these when building the `ClusterInfo`
saves more than 2% CPU time during many-shards benchmark bootstrapping because
the lookup of the shard data path by shard-routing mostly hit instance equality.
Also, this saves a little memory.

This PR also moves the callback for building `ClusterInfo` from the stats response to
the management pool as it is now more expensive (though the overall CPU use from it is trivial
relative to the cost savings during reroute) and was questionable to run on
a transport thread in a large cluster to begin with.

Co-authored-by: David Turner <david.turner@elastic.co>
2022-08-09 11:08:13 +02:00
Armin Braun 254e6bcabd
Remove needless optimization ShardRouting.asList (#89179)
This iterator is never used in hot code (there seems to only be a single production code usage for it),
no need to cache a list here just for it.
2022-08-09 09:17:34 +02:00
Armin Braun f3659a64c9
Remove redundant and slow null token check from KeywordFieldMapper (#89168)
No need to check for the null token manually and parse `textOrNull`.
Either we can just use `text()` since we know we don't have to deal
with a null token or use `textOrNull` and check the return value for
`null`.
I chose the latterbecause it benchmarked slightly faster in `BeatsMapperBenchmark`
but both save the expensive call to `currentToken` on the heavily nested x-content
parser that we use here.
2022-08-08 19:18:29 +02:00
Armin Braun 2429dbc451
Dry up custom immutable Map.Entry implementations (#89153)
Follow-up to #88815.
No need to have two equivalent implementations here.
2022-08-08 19:03:30 +02:00
Jack Conradson 24e367fe0f
Add support for source fallback with the boolean field type (#89052)
This change adds a SourceValueFetcherSortedBooleanIndexFieldData to support boolean doc values 
for source fallback.
2022-08-08 08:38:48 -07:00
Nhat Nguyen cfad420cde
Enable BloomFilter for _id of non-datastream indices (#88409)
This PR adds BloomFilter to Elasticsearch and enables it for the _id 
field of non-data stream indices. BloomFilter should speed up the
performance of mget and update requests at a small expense of refresh,
merge, and storage.
2022-08-08 11:14:26 -04:00
Keith Massey ee33383156
Polling for cluster diagnostics information (#89014)
This commit causes non-master-eligible nodes to poll a random master-eligible node every 10 seconds
whenever the elected master goes null for diagnostic information in support of the health API's master
stability check.
2022-08-08 08:29:09 -05:00
Ievgen Degtiarenko 7c38041b9e
Make it explicit that test expects no rebalancing. (#89040)
This is required in case new shards allocator might be more proactive with
rebalancing.
2022-08-08 09:13:34 +02:00
Ievgen Degtiarenko 63f1ab5ab2
Fix flaky StableMasterDisruptionIT#testNoQuorum (#89064)
Above test may fail if the master node replies within 1s.
This happens on some of our slower CI workers.
Whitelisting additional error message.
2022-08-08 08:25:13 +02:00
Keith Massey e4a9967469
Handling the master stability health case where there has never been an elected master node (#89137)
If a master-eligible node comes up and has never seen an elected master
node (and assuming that a quorum requires more than one node), then it
ought to report that the master stability health is red because it
cannot form a quorum.
2022-08-05 06:06:01 +09:30
Armin Braun d6e6980b0b
Remove unused datastream snapshot utility from Metadata (#88535)
This method was introduced to fix datastream snapshots during
concurrent index/datastream changes but was never actually
used because we went with a different approach in the end.
=> remove it and its tests
2022-08-04 22:18:45 +02:00
Nhat Nguyen e3c33e2acd
Deduplicate fetching doc-values fields (#89094)
If a docvalues field matches multiple field patterns, then ES will 
return the value of that doc-values field multiple times. Like fetching
fields from source, we should deduplicate the matching doc-values
fields.
2022-08-04 14:05:09 -04:00
likzn f28f4545b2
In the field capabilities API, re-add support for `fields` in the request body (#88972)
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.

This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.

Closes #86875
2022-08-04 13:44:50 -04:00
Christos Soulios b81f4187ab
[TSDB] Metric fields in the field caps API (#88695)
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field 
caps API per field.

Especially for metrics fields, it must be clear which fields are metrics and if they belong 
to only time-series indexes or mixed time-series and non-time-series indexes.

To further distinguish metric fields when they belong to any of the following indices:

  -  Standard (non-time-series) indexes
  -  Time series indexes
  -  Downsampled time series indexes

This PR modifies the field caps API so that the mapping parameters time_series_dimension 
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.

This PR revisits some of the conventions adopted by #78790
2022-08-04 20:42:34 +03:00
zhouhui 8f08c7b55b
Override bulk visit methods of exitable point visitor (#82120) 2022-08-04 11:48:36 -04:00
David Turner 7f2331cdfb
Merge trivial changes from desired balance feature branch (#89109) 2022-08-04 13:18:41 +01:00
Mary Gouseti 418883aeb9
maybeScheduleNow with delay 0 instead of 1 (#89110)
Replace the 1 millisecond delay to 0 when we want to schedule a
monitoring task now.
2022-08-04 21:39:57 +09:30
Rene Groeschke 3909b5eaf9
Add verification metadata for dependencies (#88814)
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. 

- Use sha256 in favor of sha1 as sha1 is not considered safe these days.

Closes https://github.com/elastic/elasticsearch/issues/69736
2022-08-04 09:51:16 +02:00
Julie Tibshirani 21eb984e64
Deprecate the _knn_search endpoint (#88828)
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.

Relates to #87625
2022-08-03 15:19:01 -04:00
Marcin Słowiak 4e1a0631e8
Adjust logging message for adding index block (#85237) 2022-08-03 11:21:35 -04:00
Keith Massey 740bcde590
Fixing a race condition in CoordinationDiagnosticsServiceIT #89055
This makes sure that the test cluster is stable in CoordinationDiagnosticsServiceIT::testBlockClusterStateProcessingOnOneNode before proceeding with the rest of test.
2022-08-03 08:27:01 -05:00
Rory Hunter 512bfebc10
Provide tracing implementation using OpenTelemetry + APM agent (#88443)
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.

See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.

The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.

To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.

There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.

I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
2022-08-03 14:13:31 +01:00
Denilson das Mercês Amorim 6bf5078fa9
Improve efficiency of BoundedBreakIteratorScanner fragmentation algorithm (#89041)
As discussed in #73569 the current implementation is too slow in certain scenarios.

The inefficient part of the code can be stated as the following problem:

Given a text (getText()) and a position in this text (offset), find the sentence 
boundary before and after the offset, in such a way that the after boundary is 
maximal but respects end boundary - start boundary < fragment size.

In case it's impossible to produce an after boundary that respects the said 
condition, use the nearest boundary following offset.

The current approach begins by finding the nearest preceding and following boundaries, 
and expands the following boundary greedily while it respects the problem restriction. This 
is fine asymptotically, but BreakIterator which is used to find each boundary is sometimes 
expensive.

This new approach maximizes the after boundary by scanning for the last boundary 
preceding the position that would cause the condition to be violated (i.e. knowing start
boundary and offset, how many characters are left before resulting length is fragment size). 
If this scan finds the start boundary, it means it's impossible to satisfy the problem 
restriction, and we get the first boundary following offset instead (or better, since we 
already scanned [offset, targetEndOffset], start from targetEndOffset + 1).
2022-08-03 12:07:17 +01:00
Armin Braun f4fb03c5e2
Make org.elasticsearch.cluster.routing.RoutingNode#copyShards use Array (#88788)
We used this in three spots, where it copies potentially huge arrays.
One of those spots doesn't need the `copyShards` call at all
and can use the normal iterator as there's no concurrent modfication.

The other two spots can at least just use an array, which will iterate
a little faster than a mutable list and also potentially saves another
round copying the array in the `ArrayList` constructor that the compiler
seems to not be able to eliminate in all cases.
2022-08-03 12:43:54 +02:00
Rory Hunter 9285249533
Wrap code in new tracing contexts where required (#88920)
Part of #84369. Split out from #88443. This PR wraps parts of the code
in a new tracing context. This is necessary so that a tracing
implementation can use the thread context to propagate tracing headers,
but without the code attempting to set the same key twice in the thread
context, which is illegal. In order to avoid future diff noise, the wrapped
code has mostly been refactored into methods.

Note that in some places we actually clear the tracing context
completely. This is done where the operation to be performed should have
no association with the current trace context. For example, when
creating a new index via a REST request, the resulting background tasks
for the index should not be associated with the REST request in
perpetuity.
2022-08-03 11:15:50 +01:00
Mary Gouseti d828c2a642
Health API - Monitoring local disk health (#88390)
This PR introduces the local health monitoring functionality needed for
#84811 . The monitor uses the `NodeService` to get the disk usage stats
and determines the node's disk health.

When a change in the disk's is detected or when the health node changes,
this class would be responsible to send the node's health to the health
node. Currently this is simulated with a method that just logs the
current health.

The monitor keeps the last reported health, this way, if something fails
on the next check it will try to resend the new health state.
2022-08-03 17:40:26 +09:30
Albert Zaharovits c7e10e70e1
Refactor WildcardExpressionResolver to better track usages of indices lookup (#89000)
This is a pure refactoring of the WildcardExpressionResolver.
The objective is to restrict access to the indices lookup through the context parameter only.
Eventually, Security is going to plug into the context and only show a restricted view of the
indices lookup, particular to the user context.
2022-08-03 09:01:44 +03:00
Jack Conradson 3bb4a84bdd
Support source fallback for double, float, and half_float field types (#89010)
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
2022-08-02 10:13:58 -07:00
Armin Braun d9dc3a9629
Preemptively initialize routing nodes and indices lookup on all node types (#89032)
Follow up to #89005 running the initialization as soon as possible on non-master
nodes as well.
2022-08-02 17:30:18 +02:00
Rory Hunter 5f14c79320
Wrap ML model loading task in new tracing context (#89024)
Part of #84369.

ML uses the task framework to register a tasks for each loaded model.
These tasks are not executed in the usual sense, and it does not make
sense to trace them using APM. Therefore, make it possible to register
a task without also starting tracing.
2022-08-02 14:45:14 +01:00
Ievgen Degtiarenko 1337de73e3
Mute testBlockClusterStateProcessingOnOneNode (#89038)
Related to: #89015
2022-08-02 21:41:08 +09:30
Ievgen Degtiarenko 35e0736956
Make it explicit that test expects no rebalancing (#89028)
This is required in case new shards allocator might be more proactive with rebalancing.
2022-08-02 13:03:14 +02:00
Armin Braun 9bed4b89fd
Preemptively compute RoutingNodes and the indices lookup during publication (#89005)
Computing routing nodes and the indices lookup takes considerable time
for large states. Both are needed during cluster state application and
Prior to this change would be computed on the applier thread in all cases.
By running the creation of both objects concurrently to publication, the
many shards benchmark sees a 10%+ reduction in the bootstrap time to
50k indices.
2022-08-02 11:02:00 +02:00
Ievgen Degtiarenko e4214efe6d
Make it explicit that test expects no rebalancing. (#88993)
This is required in case new shards allocator might be more proactive with
rebalancing.
2022-08-02 10:03:31 +02:00
Keith Massey 352a688b04
Eliminating initial delay of CoordinationDiagnosticsService#beginPollingClusterFormationInfo for integration tests (#89001) 2022-08-01 13:42:34 -05:00
Mary Gouseti 524543e41c
Extract least/most available disk space DiskUsage (#88996) 2022-08-01 20:32:59 +02:00
Christos Soulios ad2dc834a7
Add `synthetic_source` support to `aggregate_metric_double` fields (#88909)
This PR implements synthetic_source support to the aggregate_metric_double
field type

Relates to #86603
2022-08-01 20:42:25 +03:00
Armin Braun b7240393c6
Save loop over all local shards in IndicesClusterService.applyClusterState (#88210)
We can save another two loops here by checking for shards to fail in the same loop that updates or creates shards.
Also, we only need to loop over all indices services locally once for deleting indices as a whole or just shards
out of existing indices.
2022-08-01 17:53:32 +02:00
Jack Conradson 5194d29b1c
Support source fallback for byte, short, and long fields (#88954)
This change adds source fallback support for byte, short, and long fields. These use the already 
existing class SourceValueFetcherSortedNumericIndexFieldData.
2022-08-01 08:23:36 -07:00
Keith Massey 579692d5a3
Fix race conditions in master stability polling (#88874)
This fixes some possible race conditions in the cluster formation polling of the stable master code.
It also prevents the list of tasks from growing indefinitely.
2022-08-01 09:29:21 -05:00
Armin Braun 70a7276d77
Avoid expensive loop in indicesDeletedFromClusterState() when possible (#88986)
The loop over all indices here gets very expensive for large states, we
can avoid it often when metadata changes but not the indices maps.
2022-08-01 16:06:43 +02:00
Nik Everett 87ab933c8b
Remove calls to deprecated xcontent method (#84733)
This removes many calls to the last remaining `createParser` method that
I deprecated in #79814, migrating callers to one of the new methods that
it created.
2022-08-01 22:18:03 +09:30