Today we log at `DEBUG` when we receive a scroll response and send the
bulk request but do not log the completion of the bulk request or the
start of the next scroll request. This makes it impossible to tell from
the logs how long these things are taking.
This commit adds the missing logging.
Originally, a doc fields phase was created to collect information about what fields are accessed using constant values. This was going to be used for detecting cyclical field access in runtime fields, but another approach was taken instead. This change deletes the un-used phase.
Fixes a few rough edges in this class:
* we need to always pass a flush call down the pipeline and not just conditionally
if they apply to the message handler, otherwise we lose flushes e.g. when a channel
becomes not-writable due to a write from off the event-loop that exceeds the outbound
buffer size
* this is suspected of causing recently observed intermittent and unexplained slow message writes (logged by the outbound slow logger) where a message became stuck until a subsequent message was sent (e.g. during period leader checks or so)
* Pass size `0` messages down the pipeline instead of just resolving their promise to avoid
unexpected behavior (though we don't make use of `0`-length writes as of today
* Avoid unnecessary flushes in queued-writes loop and only flush if the channel stops being
writable
* Release buffers on queued writes that we fail on channel close (not doing this wasn't causing bugs today because we release the underlying bytes elsewhere but could cause trouble later)
Unfortunately, I was not able to reproduce the issue in the first point reliably as the timing is really tricky. I therefore tried to make this PR as short and uncontroversial as possible. I think there's possible further improvements here and this should have been caught by a test but it's not yet clear to me how to design a reliable reproducer here.
This commit upgrades the existing SSPL licensed "ssl-config" library
to include additional features that are supported by the X-Pack SSL
library.
This commit does not make any changes to X-Pack to use these new
features - it introduces them in preparation for their future use by
X-Pack.
The reindex module is updated to reflect API changes in ssl-config
This change adds additional assertion in GeoIpDownloaderIT.testInvalidTimestamp which makes sure that validity checks work both ways (so going out of validity and back) and it should fix race in cleanUp method leading to occasional failures.
Closes#75221Closes#74358
Adjust GeoIpDownloaderIT test suit to wait for managed databases files
to be removed after each test.
After each test geoip downloader is disabled, which should eventually
remove the managed geoip database files. This happens in the background.
However a new test starts that assumes that the builtin databases are used
then that test can fail, because expected assertions will fail. The changes
in this commit should address this.
Closes#74358
this PR removes tests which are not meant to be fixed (ml/, vectors/) to a separate "not to be fixed list" so that we can see which compatible changes are meant to be implemented.
relates #51816
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.
To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
We currently have one ParseContext class, which is used to parse incoming documents, not to be confused with the former ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.
There are a few implementations of ParseContext, but mostly the InternalParseContext one is used. There is also a FilterParseContext that allows to delegate to a given context for all methods besides the one explicitly overridden by it.
This commit attempts to simplify ParseContext by extracting its InternalParseContext implementation and moving it where it's used, within DocumentParser and making it private, so that the super-class can be used. This allows to hide some implementation details that only InternalParseContext knows about on nested documents and the way they are stored in lucene.
Also, we are introducing separate test implementations in place of reusing InternalParseContext in tests too.
Additionally FilterParseContext can be greatly simplified by relying on a copy constructor, that makes it so that it does not have to override every single method to delegate to the provided context, at least for the behaviour that can't be overridden (final methods).
This change fixes problem with GeoIpProcessor when there's GeoIpTaskState present in the cluster state but there's no database matching the one used by the processor. It can happen when there are some but not all databases already updated.
User defined functions are instance methods on the Script class.
Update lambdas and method references to capture the script `this`
reference.
Def method encoding string takes an extra char at index 1, whether
to capture the script reference.
For runtime fields, this means emit, which is an script instance
method already, now works in user defined functions.
Fixes: #69742
Refs: #68235
Previously removed in #46985. The yaml test is included in this PR, but
will be removed once #74689 is merged.
relates #54160
relates main meta issue #51816
This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing.
The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large `RepositoryData` (and limiting us to `2G` max blob size).
With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation.
As we move to larger repositories this makes master node stability a lot more predictable since writing out `RepositoryData` does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on `RepositoryData`.
With types removal changes being available under rest api compatibility I have removed the block entries for tests which are already fixed
relates #51816
This change updates the way we handle net new system indices, which are
those that have been newly introduced and do not require any BWC
guarantees around non-system access. These indices will not be included
in wildcard expansions for user searches and operations. Direct access
to these indices will also not be allowed for user searches.
The first index of this type is the GeoIp index, which this change sets
the new flag on.
Closes#72572
Added the dimension parameter to the following field types:
keyword
ip
Numeric field types (integer, long, byte, short)
The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.
Relates to #74014
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
There is no reason for Document to be an inner class of ParseContext, especially as it is public and accessed directly from many different places.
This commit takes it out to its own top-level class file, which has the advantage of simplifying ParseContext which could use some love too.
ParserContext is an inner class of Mapper.TypeParser but is used outside of the context of parsing mappers, for instance also to parse runtime fields. Its purpose is to be used to parse mappings in general, and its name is confusing as it differs ever so slightly from ParseContext which is used for parsing incoming documents.
This commit moves ParserContext to be a top-level class, and renames it to MappingParserContext.
This PR changes the way GeoIpDownloader and GeoIpProcessor handle situation when we are unable to update databases for 30 days. In that case:
GeoIpDownloader will delete all chunks from .geoip_databases index
DatabaseRegistry will delete all files on ingest nodes
GeoIpProcessor will tag document with tags: ["_geoip_expired_database"] field (same way as in Logstash)
This change also fixes bug with that breaks DatabaseRegistry and when it tires to download databases after updating timestamp only (GeoIpDownloader checks if there are new databases and updates timestamp because local databases are up to date)
The version field on all lucene Analyzers is unused, and is being removed
in lucene 9. This commit deprecates setting a version on an analyzer in
index settings and removes the related calls to Analyzer.setVersion()
Relates to #74057
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
This change removes the current GeometryFormat interface and replace then with a GeometryParser that deals
with Parsing formats and a GeometryFormatFactory that deals with fields API formats.
Modularization of the JDK has been ongoing for several years. Recently
in Java 16 the JDK began enforcing module boundaries by default. While
Elasticsearch does not yet use the module system directly, there are
some side effects even for those projects not modularized (eg #73517).
Before we can even begin to think about how to modularize, we must
Prepare The Way by enforcing packages only exist in a single jar file,
since the module system does not allow packages to coexist in multiple
modules.
This commit adds a precommit check to the build which detects split
packages. The expectation is that we will add the existing split
packages to the ignore list so that any new classes will not exacerbate
the problem, and the work to cleanup these split packages can be
parallelized.
relates #73525
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.
relates #73784
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
The recent upgrade of the Azure SDK has caused a few test failures that
have been difficult to debug and do not yet have a fix. In particular, a
change to the netty reactor resolving
(https://github.com/reactor/reactor-netty/issues/1655). We need to wait
for a fix for that issue, so this reverts commit
6c4c4a0ecb.
relates #73493
The org.elasticsearch.bootstrap package exists in server with classes
for starting up Elasticsearch. The elasticsearch-core jar has a handful
of classes that were split out from there, namely java version parsing
and jarhell. This commit moves those classes to a new
org.elasticsearch.jdk package so as to not split the server owned
bootstrap package.
relates #73784
The parent ID join field is an internal field that links child documents to
their parent. Although it's internal, we include it when listing all field
types. This means a search with `"fields": "*"` can attempt to fetch values from
the parent ID field and fail.
This PR applies a simple fix to return an empty result instead of failing.
FieldTypeLookup and MappingLookup expose the getMatchingFieldTypes method to look up matching field type by a string pattern. We have migrated ExistsQueryBuilder to instead rely on getMatchingFieldNames, hence we can go ahead and remove the remaining usages and the method itself.
The remaining usages are to find specific field types from the mappings, specifically to eagerly load global ordinals and for the join field type. These are operations that are performed only once when loading the mappings, and may be refactored to work differently in the future. For now, we remove getMatchingFieldTypes and rather call for the two mentioned scenarios getMatchingFieldNames(*) and then getFieldType for each of the returned field name. This is a bit wasteful but performance can be sacrificed for these scenarios in favour of less code to maintain.
This changes fixes three bugs all related to function references in Painless.
First:
Fixes a bug where if a primitive type is used as a capture a VerifyError is returned from the compiler.
The primitive is now boxed automatically on behalf of the user.
Example:
long test(Supplier s) {return s.get();} int i = 1; return test(i::intValue);
Second:
Fix a bug where we output an internal error as opposed to a user error in the case of a static method
used with a non-static capture. We now give the user feedback that they cannot do this instead of no
useful information.
Example:
int test(Function f, String s) {return f.apply(s);} Integer i = Integer.valueOf(1); test(i::parseInt, '1')
Third:
Fix a bug where interface methods using reflection do not match their method handle counterparts
for overridden methods on Object. CharSequence specifically overrides toString, but the
MethodHandle it gets is for Object. Function references now take this into account and write out the
correct constant for a method reference interface.
Example:
CharSequence test(Supplier s) {return s.get();} CharSequence s = 's'; return test(s::toString);
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.
closes#66555closes#67214
Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com>
Upgrades to Lucene-8.9 snapshot which includes:
- LUCENE-9507: Custom order for leaves (/cc @mayya-sharipova)
- LUCENE-9935: Enable bulk merge for stored fields with index sort
This commit adds a `cancelled` flag to each cancellable task in the
response to the list tasks API, allowing users to see that a task has
been properly cancelled and will complete as soon as possible.
Closes#72907
This changes the Painless sandbox to be more encompassing of possible
compiler bugs including JVM bugs. This prevents any single script from
crashing a node under a wider array of circumstances that in theory should
be recoverable with possible changes to a user-defined script.
MappingLookup has a method simpleMatchToFieldName that attempts
to return all field names that match a given pattern; if no patterns match,
then it returns a single-valued collection containing just the pattern that
was originally passed in. This is a fairly confusing semantic.
This PR replaces simpleMatchToFullName with two new methods:
* getMatchingFieldNames(), which returns a set of all mapped field names
that match a pattern. Calling getFieldType() with a name returned by
this method is guaranteed to return a non-null MappedFieldType
* getMatchingFieldTypes, that returns a collection of all MappedFieldTypes
in a mapping that match the passed-in pattern.
This allows us to clean up several call-sites because we know that
MappedFieldTypes returned from these calls will never be null. It also
simplifies object field exists query construction.
Implements a V7 compatible typed endpoints for REST for search related apis
retrofits the REST layer change removed in #41640
relates main meta issue #51816
relates types removal issue #54160
Prepend `&` to user function names. In future changes user
functions will switch from being static methods to member methods.
The mangled user function names will prohibit users from overriden
other script methods.
Refs: #69742
Use an iterator instead of a list when passing around what to delete.
In the case of very large deletes the iterator is a much smaller than
the actual list of files to delete (since we save all the prefixes
which adds up if the individual shard folders contain lots of deletes).
Also this commit as a side-effect adjusts a few spots in logging where the
log messages could be catastrophic in size when trace logging is activated.
Due to problems discovered in #72572 we have to disable geoip downloader for now. We use ingest.geoip.downloader.enabled.default as feature flag.
This change also reverts changes to docs.
There should be a singleton for the empty version of this.
All the copying to `String[]` or use as an iterator make
no sense either when we can just use the list outright.
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.
closes#66555closes#67214
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
MappingLookup became capable of parsing documents because we needed the search execution context to expose the ability to parse a document that did not depend on a mutable document mapper (only the percolator uses this feature).
In hindsight, parsing documents is quite a specific usecase that does not quite fit in MappingLookup. Also, it introduces the need for MappingLookup to hold IndexSettings, IndexAnalyzers and DocumentParser only for that purpose.
Instead, we can expose the DocumentParser by making it public and make its parse method accept a MappingLookup instance.
We recently replaced some usages of DocumentMapper with MappingLookup in the search layer, as document mapper is mutable which can cause issues. In order to do that, MappingLookup grew and became quite similar to DocumentMapper in what it does and holds.
In many cases it makes sense to use MappingLookup instead of DocumentMapper, and we may even be able to remove DocumentMapper entirely in favour of MappingLookup in the long run.
This commit replaces some of its straight-forward usages.
MapperTestCase has a check that if a field mapper supports stored fields,
those stored fields are available to index time scripts. Many of our mappers
do not support stored fields, and we try and catch this with an assumeFalse
so that those mappers do not run this test. However, this test is fragile - it
does not work for mappers created with an index version below 8.0, and it
misses mappers that always store their values, e.g. match_only_text.
This commit adds a new supportsStoredField method to MapperTestCase,
and overrides it for those mappers that do not support storing values. It
also adds a minimalStoredMapping method that defaults to the minimal
mapping plus a store parameter, which is overridden by match_only_text
because storing is not configurable and always available on this mapper.
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.
This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.
Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().
GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.
Relates to #56063
As required by MaxMind license we can't use databases that are older than 30 days as we could miss "don't sell" request.
This check was missing before and this change fixes that.
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.
This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
This fixes the output for the date type, geo point type, and ip type to match as
if it was coming from the fields API. This also ensures it's in a "pretty",
human-readable format.
We shouldn't loop over the listeners under the mutex in `done` since in most use-cases we used `DirectExecutorService`
with this class.
Also, no need to create an `AbstractRunnable` for direct execution. We use this listener on the hot path in authentication
making this a worthwhile optimization I think.
Lastly, no need to clear and thus loop over `listeners`, the list is not used again after the `done` call returns anyway
so no point in retaining it at all (especially when in a number of use cases we add listeners only after the `done` call
so we can also save the instantiation by making the field non-final).
We should also use the sizer for outbound connections like
we do for inbound. Also made it a singleton since we use it in 3
spots now and there's no point instantiating it multiple times.
Back when we indexed every value-containing field from a document in
the _field_names field, it made sense to also index object subpaths
from that field so that we count efficiently run exists queries against
objects. However, since many fields now instead use docvalues or norms
iterators for their exists queries, and don't store their field names in
_field_names, object exists queries cannot make use of these
intermediate paths. We're still storing them, however.
This commit stops storing these intermediate paths in the _field_names
field, as they are unused and just take up extra space.
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
Instead of doing a refresh as part of each index request, perform
this separately after all chunks have been indexed.
Also perform a flush, so that the translog is trimmed and
doesn't contain all these large write operations (1mb) until
an automatic refresh happens (which may take a while since
no other indexing will take place for a while).
We have recently split DocumentMapper creation from parsing Mapping. There was one method leftover that exposed parsing mapping into DocumentMapper, which is generally not needed. Either you only need to parse into a Mapping instance, which is more lightweight, or like in some tests you need to apply a mapping update for which you merge new mappings and get the resulting document mapper. This commit addresses this and removes the method.
This adds a new `match_only_text` field, which indexes the same data as a `text`
field that has `index_options: docs` and `norms: false` and uses the `_source`
for positional queries like `match_phrase`. Unlike `text`, this field doesn't
support scoring.
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.
Fixes#71874
DoubleScriptFieldRangeQuery which is used on runtime fields of type "double"
currently uses simple double type comparison for checking its upper and lower
bounds. Unfortunately it seems that -0.0 == 0.0, but when we want to exclude a
0.0 bound via "lt" the generated range query uses -0.0 as its upper bound which
erroneously includes the 0.0 value. We can use `Double.compare` instead which
seems to handle this edge case well.
Closes#71786
In update by query requests where max_docs < size and conflicts=proceed
we weren't using the remaining documents from the scroll response in
cases where there were conflicts and in the first bulk request the
successful updates < max_docs. This commit address that problem and
use the remaining documents from the scroll response instead of
requesting a new page.
Closes#63671
Up until now, the name of the script contexts that runtime fields use was internal only. They recently got exposed through the painless execute API. This commit fixes the discrepancy between the field type used to define a runtime field of type keyword and the script context needed to simulate its corresponding script: string_field should be keyword_field.