Commit Graph

270 Commits

Author SHA1 Message Date
David Turner 1eda6ac74b
Extract ESIntegTestCase#prepareSearch (#101175)
Relates #101172
2023-10-20 06:18:58 -04:00
Alan Woodward edab22a31c
Consistent scores for multi-term SourceConfirmedTestQuery (#100846)
SourceConfirmedTestQuery uses a QueryVisitor to collect terms from
its inner query to build its internal SimScorer. It is important to hold these
terms in a consistent order so that when scores for each term are summed,
the order of summation is the same as it would be for the inner query. This
commit changes the call to visit to use a LinkedHashSet to ensure that
terms are iterated in the order in which they are collected.

Fixes #98712
2023-10-16 10:11:10 +01:00
Armin Braun b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Alan Woodward bb5ed9899b
Implement matches() on SourceConfirmedTextQuery (#100134)
`match_only_text` does not currently support highlighting via the matches
option of the default highlighter. This commit implements matches on the
backing query for this field, and also fixes a bug where the field type's
value fetcher could hold on to the wrong reference for a source lookup,
causing threading errors.
2023-10-04 10:03:48 +01:00
Alan Woodward 4e1fb3fca5
Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346)
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.

This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.

Relates to #96051
2023-09-13 15:02:22 +01:00
Armin Braun 574fb05946
Deduplicate org.apache.lucene.document.FieldType instances across mappers (#99361)
We mostly have a handful of `FieldType` values here across all mappers and none of them contain
attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up
subsequent mapper heap savings.
2023-09-08 22:18:35 +02:00
Armin Braun f1a376c317
Remove CopyTo.Builder (#99368)
The copyTo builder is really hard to reason about when it comes to
mapper merging, because the `reset` method would actually mutate an
existing mapper. That seems dangerous and the whole thing is quite
inefficient as well. -> this PR just removes it and uses a copy
constructor for copy on write, avoiding instance creation on mapper
merges here and there and leaving no doubt about these things being
immutable.
2023-09-08 13:24:31 -04:00
Ryan Ernst 19257125b1
Move transport version constants to TransportVersions (#97990)
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.

This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
2023-09-06 15:14:41 -04:00
Matteo Piergiovanni e719057209
Explicit parsing object capabilities of FieldMappers (#98684)
When the subobject property is set to false and we encounter an object 
while parsing we need a way to understand if its FieldMapper is able to 
parse an object. If that's the case we can provide the entire object to 
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.

This has being achieved by adding the `supportsParsingObject()` method 
to the `FieldMapper` class. This method defaults to `false` since the 
majority of FieldMappers do not support parsing objects and is 
overwritten to return `true` by the ones that do support objects.
2023-08-22 10:16:59 +02:00
David Turner dadaaa8315 AwaitsFix for #98712 2023-08-22 09:11:21 +01:00
Christoph Büscher 207a995fce
Use newSearcher instead of new IndexSearcher in tests where possible (#98110)
This change swaps test code that directly creates IndexSearcher instances with LuceneTestCase#newSearcher calls
that have the advantage of randomly using concurrency and also randomly use assertion wrappers internally.
While this doesn't guarantee testing the concurrent code path, it should generally increase the likelihood of doing so.
2023-08-22 10:49:21 +07:00
tmgordeeva 171bcbb3e1
Mapped field types searchable with doc values (#97724)
* Mapped field types searchable with doc values

When using TSDB, time series metrics aren't indexed but do have doc values.
Field caps should report those fields as searchable.
2023-08-15 19:29:49 -07:00
eyalkoren 3d36b08d28
Fix `fields` API with `subobjects: false` (#97092) 2023-07-12 11:35:18 +03:00
Martijn van Groningen 55561588f5
Fix mapping parsing logic to determine synthetic source is active. (#97355)
Take index mode into account during parsing of the mapping when
determining whether source is synthetic

Fixes #97320
2023-07-06 06:42:14 -04:00
Simon Cooper a873e26cf7
Convert IndexVersion.CURRENT to a method with a pluggable interface (#97132) 2023-06-27 14:47:32 +01:00
Armin Braun 3f8ee82ef8
Use indices admin client shortcut in most integration tests (#96946)
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.
2023-06-20 13:32:59 +02:00
Simon Cooper 71c12262fb
Migrate index created version to IndexVersion (#96066) 2023-06-14 09:43:31 +01:00
Luca Cavanna e5768d9335
Upgrade Lucene to a 9.7.0 snapshot (#96433)
Most relevant changes:

- add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840)
- knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160)
- Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API  apache/lucene#12311)

As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader.

Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>
2023-05-31 10:17:10 +02:00
Mayya Sharipova 433ce88852
Rank_feature field null_value test and small edits (#96392)
Correct and add more tests for adding null_value parameter for the
rank_feature field.

Relates to #95811, closes #95149
2023-05-30 07:33:40 -04:00
Marantidis Kiriakos e44edcebf0
Add null_value for rank_feature field
Closes #95149
2023-05-23 12:52:27 -04:00
Rory Hunter fe1083f6c5
Upgrade spotless plugin to 6.17.0 (#94994)
Fixes #82794. Upgrade the spotless plugin, which addresses the issue
around formatting `instanceof` expressions. Formatting of statements
including lambdas seems to have improved too.
2023-04-04 10:03:32 +01:00
Alan Woodward 093e36c875
Introduce DocumentParsingException (#92646)
Document parsing methods currently throw MapperParsingException. This
isn't very helpful, as it doesn't contain any information about where the parse
error happened - it is designed for parsing mappings, which are realised into
java maps before being examined. This commit introduces a new exception
specifically for document parsing that extends XContentException, so that
it reports the current position of the parser as part of its error message.

Fixes #85083
2023-03-31 12:14:19 +01:00
Alan Woodward 131da70321
ValueFetchers now return a StoredFieldsSpec (#94820)
This allows us to be more conservative about what needs to be loaded
when using the fields API, and opens up the possibility of avoiding
using stored fields or source altogether if we can use doc values to
fetch values.

This commit also uses this new information from ValueFetchers to 
more efficiently preload stored fields for the `fields` API, while
still allowing the lazy loading of individual fields if they are asked
for by scripts or runtime fields which cannot be introspected.
2023-03-30 10:46:43 +01:00
Simon Cooper 56d53da381
Migrate LuceneDocument.getFields(String) to a List (#94830) 2023-03-29 11:08:36 +01:00
Alan Woodward 35da97214c
Make IndexAnalyzers an interface (#94819)
IndexAnalyzers is currently always a concrete class wrapping several
Maps of NamedAnalyzers. This means that whenever it is used it needs
to instantiate all of its component analyzers, making testing much heavier
than it needs to be. It also means that things like overriding analysis for
legacy indexes is pushed into mapper parameters, rather than being
handled in a single place.

This commit makes IndexAnalyzers into an interface, with an anonymous
concrete implementation that handles reloading and closing for index
shards.
2023-03-28 16:07:08 +01:00
Adrien Grand b56c2df203
Upgrade to lucene-9.6.0-snapshot-f5d1e1c787c. (#94494) 2023-03-16 16:49:54 +01:00
Benjamin Trent bc2755f0df
Fix NPE thrown by prefix and regex query in strange scenarios (#94369)
In certain scenarios, running a MultiTerm query sets a `null` rewrite method. While `null` is usually checked, there are branches in the code where this is not adequately checked.

Additionally, `MultiTermQuery#setRewriteMethod` has been deprecated for a while. So, to correct this bug, 

 - Remove calls to `MultiTermQuery#setRewriteMethod` where possible
 - Always check for `null` rewrite method


closes: https://github.com/elastic/elasticsearch/issues/94364
2023-03-08 09:36:17 -05:00
Craig Taverner e7a2c44bbf
Support position time_series_metric on geo_point fields (#93946)
Added position time_series_metric:

* start creating position time_series_metric
* Add yaml tests for queries and aggs
* Disallow multi-values for geo_point as ts-metric
* Limit running on older versions, some parts of the time-series syntax were not supported on all versions
* ScaledFloatFieldMapper does not support POSITION, We should only test it against COUNTER and GAUGE, since it only supports those two metric types
* Expand unit tests and allow parsing of dimension. We expand the tests to cover all cases tested in DoubleFieldMapperTests which also tests the behaviour of setting the dimension to true or false, so we enable parsing that for symmetry, but reject `true` as illegal for geo_point.
* Add unit tests for position metric multi-values
2023-03-01 12:57:06 +01:00
Alan Woodward e0fb33a4a5
Remove uses of deprecated LeafReader#document() method (#93984)
Lucene has deprecated LeafReader#document() in favour of a new
LeafReader#storedFields() method. This commit updates all places
in elasticsearch that use the deprecated API.

Relates to #94005
2023-02-24 12:16:28 +00:00
Martijn van Groningen df4a8f72c8
Don't treat counter fields in outside of tsdb as counters. (#93800)
Fields that have the time_series_metric attribute set to counter in non tsdb indices should use number value source type instead of counter value source type. Essentially not handling these fields as counters at search time.

Relates to #93539
2023-02-20 08:03:21 +01:00
Christoph Büscher a3f4f0bb21
Fix rank_features parsing for dots in feature name (#93756)
Currently, parsing a rank_features field where the key of the feature contains a
dot leads to a hard to understand Json parsing errors because we interpret the 
dot in the feature name to represent a start of a new object.
This change allows parsing those dots but throws a more legible IAE exception 
in case we encounter a dot in a feature name. We shouldn't allow dots in feature
names because it can create ambiguity around mappings with more than one 
'rank_features' fields that contain dots in their name and names partially overlap.
2023-02-14 16:40:21 +01:00
Benjamin Trent 323a13ac3f
Add `term` query support to rank_features mapped field (#93247)
This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work.

Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.
2023-02-01 13:32:13 -05:00
Adrien Grand c21ee47610
Switch to Lucene's new IntField/LongField/FloatField/DoubleField. (#93165)
Lucene introduced new numeric fields that index both points and doc
values. This has the same semantics as indexing one field for points and
another one for doc values as we did before, but covering both data
structures in a single field yielded a speedup in Lucene's nightly
benchmarks (see annotation
[AH](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_throughput))
which would be interesting to get too.

This commit does not switch to factory methods for queries such as
`LongField#newRangeQuery` for now, we'll need to look into it in a
follow-up.
2023-01-31 16:09:42 -05:00
Simon Cooper c513b2bcc6
Migrate VersionedWriteable & NamedDiff to TransportVersion take 2 (#93242)
Re-apply "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)"

This reverts commit 48f96090dc.
2023-01-26 09:49:08 +00:00
Simon Cooper 48f96090dc Revert "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)"
This reverts commit bef85c66e7.
2023-01-25 16:16:10 +00:00
Simon Cooper bef85c66e7
Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)
InferenceConfig is kept on Version, as that existed before VersionedNamedWriteable came along
2023-01-25 16:03:38 +00:00
Martijn van Groningen 07b9d2f296
Don't index counter and gauge scaled_float/unsigned_long fields by default (#92917)
This is the same change as #92768 but then for `scaled_float` and `unsigned_long` field types.
The support for these field types was forgotten.
2023-01-16 13:13:06 +01:00
Alan Woodward c720cdbbbf
Replace SourceLookup with SourceProvider interface (#91540)
SourceLookup mixes up several concerns - lazy loading, map access to scripts,
different access providers - and duplicates logic (such as that choosing how to
apply filtering) that is better handled directly in the Source interface.

This commit removes SourceLookup entirely and replaces it with a new
SourceProvider interface, with a simple stored fields reader implementation.
SearchLookup implements this interface directly, and the fetch phase uses
a custom implementation to provide its separately loaded source to fetch-time
scripts.
2023-01-12 16:17:46 +00:00
Luca Cavanna c53becb310
Refactor enum mappings parameter to allow for capital case types (#92548)
We have a couple of existing enum mappings parameters that specify they enum type in lowercase letters.
That is convenient to avoid having to convert enum names back and to uppercase or lowercase depending
on what's needed, yet it does not follow coding conventions in that constants should be in capital letters.
More importantly, moving the `on_script_error` mapping parameter to a streamlined enum mapping parameter is
not possible with lowercase type names because one of its values is `continue` which is a java reserved keyword.
It becomes a requirement that the actual value for an enum based mapping parameter can potentially differ from
the enum name. In general, the type name will be in capital letters, while the parameter value will be lowercase.

With this commit we refactor the enum mappings parameter to provide their types in capital case, while
users will keep on providing the corresponding values in lowercase. This only affects how the enum
types are represented internally. We can leverage toString for the enum types to do the lowercasing when needed.
2023-01-11 23:28:54 +01:00
Artem Prigoda 2bc7398754
Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106)
Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. 
Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake.
Add `Strings.format` alias in `common.Strings`
2023-01-03 19:28:27 +01:00
Mark Vieira c2eda511de
Add JUnit rule based integration test cluster orchestration framework (#92379)
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
2022-12-21 15:33:46 -08:00
Nik Everett 9d0b0bad86
Support synthetic _source for _doc_count field (#91465)
This add synthetic `_source` support for the `_doc_count` field so
downsampling should play nicely with sythetic `_source`.
2022-11-10 13:43:33 -05:00
Nik Everett 74d0d19c0f
Synthetic _source: support `field` in many cases (#89950)
This adds support for the `field` scripting API in many but not all
cases. Before this change numbers, dates, and IPs supported the `field`
API when running with _source in synthetic mode because they always have
doc values. This change adds support for `match_only_text`, `store`d
`keyword` fields, and `store`d `text` fields. Two remaining field
configurations work with synthetic _source and do not work with `field`:
* A `text` field with a sub-`keyword` field that has `doc_values` * A
`text` field with a sub-`keyword` field that is `store`d

![image](https://user-images.githubusercontent.com/215970/189217841-4378ed42-e454-42c1-aaf0-6c2c041b29be.png)
2022-11-10 10:44:06 -05:00
Alan Woodward 41ab45a5d9
Report synthetic source status in MapperBuilderContext (#91400)
We currently work out whether or not a mapper should be storing additional
values for synthetic source by looking at the DocumentParserContext. However,
this value does not change for the lifetime of the mapper - it is defined by
metadata on the root mapper and is immutable - and DocumentParserContext
feels like the wrong place for this information as it holds context specific
to the document being parsed.

This commit moves synthetic source status information from DocumentParserContext
to MapperBuilderContext instead. Mappers which need this information retrieve
it at build time and hold it on final fields.
2022-11-08 14:55:16 +00:00
Alan Woodward 34a0093928
Add a new values source type returned by metric counters (#90680)
This commit adds a new values source type to identify counter fields in
TSDB indices. Because aggregate values over this field only make sense
in specific time series contexts, this should prevent users trying to issue
nonsensical aggregations (eg, sum or avg) over counters.
2022-10-31 20:42:56 +00:00
Alan Woodward 0013d46538
Extract Source interface from SourceLookup (#90762)
SourceLookup combines a mutable lookup object that can be advanced
to different documents with access to a document's source. This combination
can make reasoning about where a Source comes from difficult, particularly
in the FetchPhase where the source gets passed around a great deal.

This commit extracts a Source interface from SourceLookup, giving read-only
access to the source, and changes various FetchPhase interfaces to take this
read-only view instead of a full lookup. You can now tell easily if a consumer
of the source is going to try and move it to a different document. As part of this
change we add a new docId parameter to various ValueFetcher methods, as
previously this could be accessed via the SourceLookup.
2022-10-11 19:50:30 +01:00
Rene Groeschke 43a0377735
Update forbiddenapis to 3.4 (#90624)
Fix breaking changes to source validation after change in default jdk rule set
2022-10-06 16:52:06 +02:00
Nik Everett bc49392bfb
Support malformed numbers in synthetic _source (#90428)
This adds support for `ignore_malformed` to numeric fields other than
`scaled_float` in synthetic `_source`. Their values are saved to a
stored field and loaded to render the `_source`.
2022-10-04 12:17:30 -04:00
Nik Everett f4fad2548f
Always support ignore_malformed in the same way (#90565)
This makes sure that all field types that support `ignore_malfored` do
so in the same way.

Production changes:
* All mapper has an `ignoreMalformed` method that must return `true` if
  the field accepts the `ignore_malformed` mapping parameter was
  configured. It defaults to `false` because many fields either don't
  have a concept of "malformed" value or don't have the ability to
  ignore malformed values.
* Fix the `scaled_float` field to store it's field name in `_ignored` if
  it ignores any malfored values. This is how all other field mappers
  work.

Test changes:
* `MapperTestCase` forces subclasses to declare if their
  `supportIgnoreMalformed` or not.
* If `MapperTestCase` subclasses `supportIgnoreMalfored` they must
  define some `exampleMalformedValues`.
* `MapperTestCase` always grows three new tests:
  * One that creates a field without setting `ignore_malformed` and
    verifies that all `exampleMalformedValues` throw expected errors
  * On that explicitly configured `ignore_malformed` to false and, if
    `supportIgnoreMalformed` it verifies the errors again. If not
    `supportIgnoreMalformed` it verifies that the parameter is unknown.
  * On that explicitly configured `ignore_malformed` to true and, if
    `supportIgnoreMalformed` it verifies that parsing doesn't produce
    errors and correctly produces `_ignored`. If not
    `supportIgnoreMalformed` it verifies that the parameter is unknown.
* Moved some subclasesses of `MapperTestCase` from
  `internalClusterTests` to `tests`. This isn't strictly required but
  that's the right place for them.
2022-10-03 06:18:02 -04:00
Alan Woodward de76c62546
Don't use fielddataBuilder to test for aggregatability (#90185)
The current default implementation of isAggregatable on MappedFieldType
tries to construct a field data builder, and returns true or false depending on
whether an exception was thrown during construction. This is fairly fragile, and
is becoming increasingly so with the introduction of field data contexts, so that
a non-aggregatable field type may in fact provide field data to scripts.

This commit changes the default implementation to check for docvalues instead
of directly building a fielddata builder, and adds checks to MapperTestCase that
verify these implementations work correctly.
2022-09-23 13:33:45 +01:00