Commit Graph

75144 Commits

Author SHA1 Message Date
Ryan Ernst 6375e9f443
Add native access library (#105100)
Elasticsearch requires access to some native functions. Historically
this has been achieved with the JNA library. However, JNA is a
complicated, magical library, and has caused various problems booting
Elasticsearch over the years. The new Java Foreign Function and Memory
API allows access to call native functions directly from Java. It also
has the advantage of tight integration with hotspot which can improve
performance of these functions (though performance of Elasticsearch's
native calls has never been much of an issue since they are mostly at
boot time).

This commit adds a new native lib that is internal to Elasticsearch. It
is built to use the foreign function api starting with Java 21, and
continue using JNA with Java versions below that.

Only one function, checking whether Elasticsearch is running as root, is
migrated. Future changes will migrate other native functions.
2024-02-07 18:27:09 -05:00
Ry Biesemeyer 0022005e17
Add stable ThreadPool constructor to LogstashInternalBridge (#105163) 2024-02-07 17:20:59 -05:00
Nhat Nguyen c736c34035
Avoid wrapping searchers multiple times in mget (#104227)
Wrapping a searcher can be expensive; and this optimization avoids 
wrapping the same searcher multiple times for a MGET request.

Closes #85069
2024-02-07 12:53:34 -08:00
Fabio Busatto b1adb78f6c
[DOCS] Update remote cluster setup instructions (#105256) 2024-02-07 21:11:57 +01:00
Niels Bauman 4b54526e8f
Fix `UpdateHealthInfoCacheActionTests.testRequestSerialization` failing (#105257)
Fixes #105254
2024-02-07 14:25:01 -05:00
Ryan Ernst 18a1ac09e7
Use open and fstat in preallocate (#105171)
Preallocate opens a FileInputStream in order to get a native file
desctiptor to pass to native functions. However, getting at the file
descriptor requires breaking modular access. This commit adds native
posix functions for opening/closing and retrieving stats on a file in
order to avoid requiring additional permissions.
2024-02-07 13:40:05 -05:00
Ryan Ernst 2ca6df71d6
Make ProviderLocator aware of boot qualified exports (#105250)
Qualfied exports in the boot layer only work when they are to other boot
modules. Yet Elasticsearch has dynamically loaded modules as in plugins.
For this purpose we have ModuleQualifiedExportsService. This commit
moves loading of ModuleQualfiedExportService instances in the boot layer
into core so that it can be reused by ProviderLocator when a qualified
export applies to an embedded module.
2024-02-07 09:43:22 -08:00
Keith Massey d8fdf6f04d
Releasing child request builder memory from BulkRequestBuilder (#105194) 2024-02-07 10:57:58 -06:00
Mary Gouseti 65d1d3d47d
Change the rest client configuration in the LazyRolloverDataStreamIT (#105243) 2024-02-07 17:44:40 +02:00
Tim Rühsen 0ea58c8ec2
[Profiling] Add azure_cost_factor request parameter (#105231) 2024-02-07 16:42:02 +01:00
Luca Cavanna 32cbb49a3f
Remove SearchException usages without a proper status code (#105150)
We have some usages of SearchException that don't provide a cause exception and also don't define a status code. That means that the status code of such requests will default to 500 which is in many cases not a good choice. Normally, for internal server error a cause is associated with the wrapper exception.

This scenario is not very common, and looks like a leftover of shard validation that used to happen on shards, which can be moved to the coordinating node. This commit moves some of the exceptions thrown in SearchService#parseSource to SearchRequest#validate. This way we will fail before serializing the shard level request to all the shards, which is much better.

Note that for bw comp reasons, we need to keep on throwing the same exception from the data node, while intuitively this is now replaced by the same validation in the coord node. This is because in a mixed cluster scenario, an older node that does not perform the validation as coord node, could serialize shard level requests that need to be checked again on data nodes, to prevent unexpected situations.
2024-02-07 16:12:27 +01:00
Liam Thompson fb743da0d7
[DOCS][ESQL] Document _source metadata field (#105237)
* [DOCS][ESQL] Document _source metadata field

* 🚗 Minor copyedit to entire page
2024-02-07 15:57:51 +01:00
Martijn van Groningen cc67205c25
Assign index.downsample.interval setting when downsample index gets created. (#105241)
This avoids keeping downsamplingInterval field around. Additionally, the
downsample interval is known when downsample interval is invoked and
doesn't change.
2024-02-07 09:31:26 -05:00
Niels Bauman 64891011d3
Extend `repository_integrity` health indicator for unknown and invalid repos (#104614)
This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status.
To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks.
Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.
2024-02-07 15:18:55 +01:00
Craig Taverner a58b2c2b05
Move doc-values classes needed by ST_INTERSECTS to server (#104980)
* Move doc-values classes needed by ST_INTERSECTS to server

This classes are needed by ESQL spatial queries, and are not licensed in a way that prevents this move.
Since they depend on lucene it is not possible to move them to a library.
Instead they are moved to be co-located with the GeoPoint doc-values classes that already exist in server.

* Moved to lucene package org.elasticsearch.lucene.spatial

* Moved Geo/ShapeDocValuesQuery to server because it is Lucene specific

And this gives us access to these classes from ESQL for lucene-pushdown of spatial queries.
2024-02-07 15:00:38 +01:00
Daniel Mitterdorfer 9651cd7e26
[Profiling] Use plain arrays in stack traces (#105226)
With this commit we refactor the internal representation of stacktraces
to use plain arrays instead of lists for some of its properties. The
motivation behind this change is simplicity:

* It avoids unnecessary boxing
* We could eliminate a few redundant null checks because we use
  primitive types now in some places
* We could slightly simplify runlength decoding
2024-02-07 14:39:38 +01:00
Martijn van Groningen baf8b5ae38
Fix a few downsample api issues (#105228)
Improve downsampling by making the following changes:

- Avoid NPE and assert tripping when fetching the last processed tsid.
- If the write block has been set, then there is no reason to start the downsample persistent tasks, since shard level downsampling has completed. Not doing so also causes ILM/DSL to get stuck on downsampling. In this case shard level downsampling should be skipped.
- Sometimes the source index may not be allocated yet on the node performing shard level downsampling operation. This causes a NPE, with this PR, this now fails a shard level downsample with a less disturbing error.

Additionally unmute
DataStreamLifecycleDownsampleDisruptionIT#testDataStreamLifecycleDownsampleRollingRestart

Relates to #105068
2024-02-07 08:28:28 -05:00
David Turner 25dd12df3b AwaitsFix for #105236 2024-02-07 12:11:51 +00:00
Pooya Salehi db4d31ddb4
Improve exception handling for stateless realtime-get/mget (#105028)
Relates #105003, ES-5727
2024-02-07 12:50:57 +01:00
Mary Gouseti 011876367a
Execute lazy rollover with an internal dedicated user #104732 (#104905)
The unconditional rollover that is a consequence of a lazy rollover command is triggered by the creation of a document. In many cases, the user triggering this rollover won't have sufficient privileges to ensure the successful execution of this rollover. For this reason, we introduce a dedicated rollover action and a dedicated internal user to cover this case and enable this functionality.
2024-02-07 13:01:01 +02:00
Armin Braun 01f19ecab2
Simplify and optimize code around TermQueryBuilder.BinaryValues (#105220)
Without a change in behavior, we can remove the ununsed ListValues as well as most
of the allocations when serializing the values.
This still leaves the problem that the temporary buffer in `valueRef` could be massive
in size, that will be addressed in a short follow-up that this change sets up.
2024-02-07 10:49:11 +01:00
Slobodan Adamović 0fefd5b881
Validate settings before reloading AD/LDAP bind password (#105133)
This is a followup to https://github.com/elastic/elasticsearch/pull/104320 which 
adds validation during secure setting reload of a `bind_password`. 
The reload of `secure_bind_password` will now fail with an exception instead 
of logging a deprecation warning.
2024-02-07 10:00:39 +01:00
David Turner 3b7b86c507
Simplify `ChunkedToXContentHelper#singleChunk` (#105225)
There's no need for this helper to take more than one argument. Almost
all the usages only passed in a single argument, and the few cases that
supplied more than one can be rewritten as a single argument to save
allocating all those extra lambdas.
2024-02-07 03:53:02 -05:00
elasticsearchmachine aaadc30111
Forward port release notes for v8.12.1 (#105218) 2024-02-07 09:16:38 +01:00
Luigi Dell'Aquila 335a75b3ef
Fix exception handling on DateFormatters.forPattern() with wrong date pattern (#105048) 2024-02-07 08:52:10 +01:00
Johannes Fredén 334aa1bc8d
Add support for fetching user profileId in Query Users (#104923)
Add support for fetching user profileId in Query Users
2024-02-07 08:49:39 +01:00
Armin Braun 5afe81cd75
Optimize SearchHit#resolveLookupFields a little (#105222)
No need to copy the keyset here to do iteration + mutation. Just update the map entries directly to save a few cycles.
2024-02-07 07:34:20 +01:00
Ryan Ernst 7c039b1728
AwaitsFix more tests for #104838 2024-02-06 16:42:01 -08:00
James Baiera 6b6fb71cf3
Remove non-portable newline from test (#105209)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-02-06 16:31:03 -05:00
Jonathan Buttner bb016bdbe9
[ML] Inference service should reject tasks during shutdown (#105213)
* Fixing inference shutdown bug

* Update docs/changelog/105213.yaml

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-02-06 16:21:23 -05:00
Fang Xing a8070debb5
[ES|QL]Allow AUTO_BUCKET to accept references created by EVAL as input to from and to (#104772)
Defer auto_bucket foldable verification to LogicalPlanOptimizer
2024-02-06 16:04:41 -05:00
Max Hniebergall 2c5c134623
[ML] Inference service support for multilingual-e5 builtin and customEland text_embedding models (#104949)
Add text_embedding service
Add support for multilingual-e5 builtin models
Add support for custom eland text_embedding models
2024-02-06 15:28:51 -05:00
Nik Everett ed7f977523
ESQL: More tests for `STATS BY blah, blah` syntax (#105204)
In ESQL you can write a grouping `STATS` command without any any
functions - just `STATS BY foo, bar, baz` and we'll calculate all the
combinations of group keys without running any functions. We only have a
single example of that in our tests though! This adds two more that are
slightly more complex. Just out of paranoia.
2024-02-06 15:13:08 -05:00
Pat Whelan 9ac31a8af1
[Transform] Remove duplicate checkpoint audits (#105164)
Transform Checkpoints have a chance to log duplicate audits or drop
iterations.  The volatile counters can be read and incremented in
multiple threads, potentially storing the same value back into memory.

Replacing volatile counters with a single Atomic counter, which counts
down the iterations until it reaches zero, then updates the counter to
the next audited checkpoint.

Closes #105106
2024-02-06 14:41:25 -05:00
David Kyle 3d380cc899
Fix compilation (#105211)
The changes in #105183 clashed with #104363
2024-02-06 14:16:58 -05:00
Jedr Blaszyk 24f9682b7d
[Connector API] Support filtering connectors by service type and a query (#105178) 2024-02-06 19:35:06 +01:00
Jonathan Buttner f1b4878c72
[ML] Switch OpenAI and Cohere configuration to use model_id field instead of model (#105195)
* Adding model_id for cohere

* Preferring model_id

* Updating openai request task settings

* Removing logging from request

* suggestedChanges

---------

Co-authored-by: Max Hniebergall <max.hniebergall@elastic.co>
2024-02-06 13:30:04 -05:00
Nhat Nguyen 8b445bfe67
Harden index mapping parameter check in enrich runner (#105096)
There is a case where the mapper parser throws a MapperParsingException 
instead of not consuming the index:false parameter. We missed that case
in the previous fix (see #98038). This PR hardens that check by 
returning false when hitting a MapperParsingException.

Relates #98038
2024-02-06 10:13:24 -08:00
elasticsearchmachine bb62f05c6d Bump versions after 8.12.1 release 2024-02-06 18:07:05 +00:00
Fang Xing 0fb5dee75b
[ES|QL] Add function log(base, value) (#104913)
Add a new scalar function log
2024-02-06 13:03:53 -05:00
David Kyle 47828788d9
[ML] Fix handling surrogate pairs in the XLM Roberta tokenizer (#105183)
UTF16 represents some characters as surrogate pairs which are represented
by 2 UTF16 characters, often emojis are encoded as surrogate pairs. This PR
fixes an error in calculating the number of bytes required to convert a UTF16
string to UTF8 as surrogate pairs were not processed properly
2024-02-06 18:00:58 +00:00
elasticsearchmachine fef2af3b07 Prune changelogs after 7.17.18 release 2024-02-06 17:50:06 +00:00
elasticsearchmachine 3749842372 Bump versions after 7.17.18 release 2024-02-06 17:48:38 +00:00
Nik Everett a7ca62de8e
Document ESQL docs examples (#105197)
This adds some docs to the top of `docs.csv-spec` and
`docs-IT_tests_only.csv-spec` telling folks not to add more stuff there
and instead put new examples into whatever files they line up with. It
also shifts some things out of the file to "prime the pump" on cleaning
it up.
2024-02-06 12:34:02 -05:00
Przemysław Witek 9b584aa1f2
[Transform] Allow transforms to use PIT with remote clusters again (#105192) 2024-02-06 18:30:25 +01:00
Luigi Dell'Aquila 669934fc0d
ES|QL: remove PROJECT keyword from the grammar (#105064) 2024-02-06 18:17:11 +01:00
Ignacio Vera 4d5416912b
Use an AbstractList to build the AggregationList for reduction (#105200)
We are building a list of InternalAggregations from a list of Buckets, therefore we can use an AbstractList to create the actual list and save some allocations.
2024-02-06 17:53:41 +01:00
David Roberts e8288fbaa8
[ML] Improve docs around ML nodes and xpack.ml.enabled (#105199)
Since these docs were originally written there have been a couple
of changes:

1. We now support aarch64 as well as x86_64, so the SSE4.2 guidance
   needed clarification.
2. ML is more deeply embedded into Elasticsearch functionality
   across nodes that are not ML nodes. For example, ingest pipelines
   now routinely use ML, and, in the near future, index mappings
   will too in the form of semantic text. Although we cannot mandate
   that xpack.ml.enabled is set uniformly across the cluster, as
   that would be a breaking change, we should say ever more strongly
   that ML must be enabled on all nodes if all ML functionality is to
   work correctly. The primary reason for wanting to disable ML is
   hardware incompatibility, and if ML is disabled for that reason
   then it should not be used at all.
2024-02-06 16:20:46 +00:00
David Kyle 5f325187cb
[ML] Make task_type optional (#104483)
Makes the task_type element of the _inference API optional so that 
it is possible to GET, DELETE or POST to an inference entity without
providing the task type
2024-02-06 16:15:24 +00:00
Joe Gallo 341f845832
Ingest geoip: tidy up logging code (#105086) 2024-02-06 10:44:48 -05:00