SourceConfirmedTestQuery uses a QueryVisitor to collect terms from
its inner query to build its internal SimScorer. It is important to hold these
terms in a consistent order so that when scores for each term are summed,
the order of summation is the same as it would be for the inner query. This
commit changes the call to visit to use a LinkedHashSet to ensure that
terms are iterated in the order in which they are collected.
Fixes#98712
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
`match_only_text` does not currently support highlighting via the matches
option of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.
This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.
Relates to #96051
We mostly have a handful of `FieldType` values here across all mappers and none of them contain
attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up
subsequent mapper heap savings.
The copyTo builder is really hard to reason about when it comes to
mapper merging, because the `reset` method would actually mutate an
existing mapper. That seems dangerous and the whole thing is quite
inefficient as well. -> this PR just removes it and uses a copy
constructor for copy on write, avoiding instance creation on mapper
merges here and there and leaving no doubt about these things being
immutable.
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.
This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
When the subobject property is set to false and we encounter an object
while parsing we need a way to understand if its FieldMapper is able to
parse an object. If that's the case we can provide the entire object to
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.
This has being achieved by adding the `supportsParsingObject()` method
to the `FieldMapper` class. This method defaults to `false` since the
majority of FieldMappers do not support parsing objects and is
overwritten to return `true` by the ones that do support objects.
This change swaps test code that directly creates IndexSearcher instances with LuceneTestCase#newSearcher calls
that have the advantage of randomly using concurrency and also randomly use assertion wrappers internally.
While this doesn't guarantee testing the concurrent code path, it should generally increase the likelihood of doing so.
* Mapped field types searchable with doc values
When using TSDB, time series metrics aren't indexed but do have doc values.
Field caps should report those fields as searchable.
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.
Most relevant changes:
- add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840)
- knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160)
- Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API apache/lucene#12311)
As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader.
Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>
Fixes#82794. Upgrade the spotless plugin, which addresses the issue
around formatting `instanceof` expressions. Formatting of statements
including lambdas seems to have improved too.
Document parsing methods currently throw MapperParsingException. This
isn't very helpful, as it doesn't contain any information about where the parse
error happened - it is designed for parsing mappings, which are realised into
java maps before being examined. This commit introduces a new exception
specifically for document parsing that extends XContentException, so that
it reports the current position of the parser as part of its error message.
Fixes#85083
This allows us to be more conservative about what needs to be loaded
when using the fields API, and opens up the possibility of avoiding
using stored fields or source altogether if we can use doc values to
fetch values.
This commit also uses this new information from ValueFetchers to
more efficiently preload stored fields for the `fields` API, while
still allowing the lazy loading of individual fields if they are asked
for by scripts or runtime fields which cannot be introspected.
IndexAnalyzers is currently always a concrete class wrapping several
Maps of NamedAnalyzers. This means that whenever it is used it needs
to instantiate all of its component analyzers, making testing much heavier
than it needs to be. It also means that things like overriding analysis for
legacy indexes is pushed into mapper parameters, rather than being
handled in a single place.
This commit makes IndexAnalyzers into an interface, with an anonymous
concrete implementation that handles reloading and closing for index
shards.
In certain scenarios, running a MultiTerm query sets a `null` rewrite method. While `null` is usually checked, there are branches in the code where this is not adequately checked.
Additionally, `MultiTermQuery#setRewriteMethod` has been deprecated for a while. So, to correct this bug,
- Remove calls to `MultiTermQuery#setRewriteMethod` where possible
- Always check for `null` rewrite method
closes: https://github.com/elastic/elasticsearch/issues/94364
Added position time_series_metric:
* start creating position time_series_metric
* Add yaml tests for queries and aggs
* Disallow multi-values for geo_point as ts-metric
* Limit running on older versions, some parts of the time-series syntax were not supported on all versions
* ScaledFloatFieldMapper does not support POSITION, We should only test it against COUNTER and GAUGE, since it only supports those two metric types
* Expand unit tests and allow parsing of dimension. We expand the tests to cover all cases tested in DoubleFieldMapperTests which also tests the behaviour of setting the dimension to true or false, so we enable parsing that for symmetry, but reject `true` as illegal for geo_point.
* Add unit tests for position metric multi-values
Lucene has deprecated LeafReader#document() in favour of a new
LeafReader#storedFields() method. This commit updates all places
in elasticsearch that use the deprecated API.
Relates to #94005
Fields that have the time_series_metric attribute set to counter in non tsdb indices should use number value source type instead of counter value source type. Essentially not handling these fields as counters at search time.
Relates to #93539
Currently, parsing a rank_features field where the key of the feature contains a
dot leads to a hard to understand Json parsing errors because we interpret the
dot in the feature name to represent a start of a new object.
This change allows parsing those dots but throws a more legible IAE exception
in case we encounter a dot in a feature name. We shouldn't allow dots in feature
names because it can create ambiguity around mappings with more than one
'rank_features' fields that contain dots in their name and names partially overlap.
This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work.
Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.
Lucene introduced new numeric fields that index both points and doc
values. This has the same semantics as indexing one field for points and
another one for doc values as we did before, but covering both data
structures in a single field yielded a speedup in Lucene's nightly
benchmarks (see annotation
[AH](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_throughput))
which would be interesting to get too.
This commit does not switch to factory methods for queries such as
`LongField#newRangeQuery` for now, we'll need to look into it in a
follow-up.
SourceLookup mixes up several concerns - lazy loading, map access to scripts,
different access providers - and duplicates logic (such as that choosing how to
apply filtering) that is better handled directly in the Source interface.
This commit removes SourceLookup entirely and replaces it with a new
SourceProvider interface, with a simple stored fields reader implementation.
SearchLookup implements this interface directly, and the fetch phase uses
a custom implementation to provide its separately loaded source to fetch-time
scripts.
We have a couple of existing enum mappings parameters that specify they enum type in lowercase letters.
That is convenient to avoid having to convert enum names back and to uppercase or lowercase depending
on what's needed, yet it does not follow coding conventions in that constants should be in capital letters.
More importantly, moving the `on_script_error` mapping parameter to a streamlined enum mapping parameter is
not possible with lowercase type names because one of its values is `continue` which is a java reserved keyword.
It becomes a requirement that the actual value for an enum based mapping parameter can potentially differ from
the enum name. In general, the type name will be in capital letters, while the parameter value will be lowercase.
With this commit we refactor the enum mappings parameter to provide their types in capital case, while
users will keep on providing the corresponding values in lowercase. This only affects how the enum
types are represented internally. We can leverage toString for the enum types to do the lowercasing when needed.
Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`.
Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake.
Add `Strings.format` alias in `common.Strings`
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
This adds support for the `field` scripting API in many but not all
cases. Before this change numbers, dates, and IPs supported the `field`
API when running with _source in synthetic mode because they always have
doc values. This change adds support for `match_only_text`, `store`d
`keyword` fields, and `store`d `text` fields. Two remaining field
configurations work with synthetic _source and do not work with `field`:
* A `text` field with a sub-`keyword` field that has `doc_values` * A
`text` field with a sub-`keyword` field that is `store`d

We currently work out whether or not a mapper should be storing additional
values for synthetic source by looking at the DocumentParserContext. However,
this value does not change for the lifetime of the mapper - it is defined by
metadata on the root mapper and is immutable - and DocumentParserContext
feels like the wrong place for this information as it holds context specific
to the document being parsed.
This commit moves synthetic source status information from DocumentParserContext
to MapperBuilderContext instead. Mappers which need this information retrieve
it at build time and hold it on final fields.
This commit adds a new values source type to identify counter fields in
TSDB indices. Because aggregate values over this field only make sense
in specific time series contexts, this should prevent users trying to issue
nonsensical aggregations (eg, sum or avg) over counters.
SourceLookup combines a mutable lookup object that can be advanced
to different documents with access to a document's source. This combination
can make reasoning about where a Source comes from difficult, particularly
in the FetchPhase where the source gets passed around a great deal.
This commit extracts a Source interface from SourceLookup, giving read-only
access to the source, and changes various FetchPhase interfaces to take this
read-only view instead of a full lookup. You can now tell easily if a consumer
of the source is going to try and move it to a different document. As part of this
change we add a new docId parameter to various ValueFetcher methods, as
previously this could be accessed via the SourceLookup.
This adds support for `ignore_malformed` to numeric fields other than
`scaled_float` in synthetic `_source`. Their values are saved to a
stored field and loaded to render the `_source`.
This makes sure that all field types that support `ignore_malfored` do
so in the same way.
Production changes:
* All mapper has an `ignoreMalformed` method that must return `true` if
the field accepts the `ignore_malformed` mapping parameter was
configured. It defaults to `false` because many fields either don't
have a concept of "malformed" value or don't have the ability to
ignore malformed values.
* Fix the `scaled_float` field to store it's field name in `_ignored` if
it ignores any malfored values. This is how all other field mappers
work.
Test changes:
* `MapperTestCase` forces subclasses to declare if their
`supportIgnoreMalformed` or not.
* If `MapperTestCase` subclasses `supportIgnoreMalfored` they must
define some `exampleMalformedValues`.
* `MapperTestCase` always grows three new tests:
* One that creates a field without setting `ignore_malformed` and
verifies that all `exampleMalformedValues` throw expected errors
* On that explicitly configured `ignore_malformed` to false and, if
`supportIgnoreMalformed` it verifies the errors again. If not
`supportIgnoreMalformed` it verifies that the parameter is unknown.
* On that explicitly configured `ignore_malformed` to true and, if
`supportIgnoreMalformed` it verifies that parsing doesn't produce
errors and correctly produces `_ignored`. If not
`supportIgnoreMalformed` it verifies that the parameter is unknown.
* Moved some subclasesses of `MapperTestCase` from
`internalClusterTests` to `tests`. This isn't strictly required but
that's the right place for them.
The current default implementation of isAggregatable on MappedFieldType
tries to construct a field data builder, and returns true or false depending on
whether an exception was thrown during construction. This is fairly fragile, and
is becoming increasingly so with the introduction of field data contexts, so that
a non-aggregatable field type may in fact provide field data to scripts.
This commit changes the default implementation to check for docvalues instead
of directly building a fielddata builder, and adds checks to MapperTestCase that
verify these implementations work correctly.