Commit Graph

291 Commits

Author SHA1 Message Date
Ignacio Vera 8a9f4fed55
Remove explicit SearchResponse references from LegacyGeo, Aggregations and parent-join modules (#101250) 2023-10-24 17:46:25 +02:00
David Turner 9794c6e205
Use ESIntegTestCase#prepareSearch more (#101179)
The refactoring in #101175 only covered all the one-arg call sites. This
PR does the rest.
2023-10-20 18:33:00 +01:00
David Turner 1eda6ac74b
Extract ESIntegTestCase#prepareSearch (#101175)
Relates #101172
2023-10-20 06:18:58 -04:00
Armin Braun ca6295e582
Remove more explicit references to SearchResponse in tests (#101092)
Remove `assertSearchResponse` which was just an alias for
`assertNoFailures` and then cleanup many spots in the result
by combining the hit count and no failure assertion into a single
method.

follow-up to #100966
2023-10-19 17:53:13 +02:00
Armin Braun 03ea4bbe6e
Remove more explicit references to SearchResponse in tests (#101052)
Follow up to #100966 introducing new combined assertion `assertSearchHitsWithoutFailures`
to combine no-failure, count, and id assertions into one block.
2023-10-18 20:27:52 +02:00
Armin Braun dcaba064dd
Remove more explicit SearchResponse references from test code (#100985)
Follow-up to #100966 adding more overrides to assertions that
consume a request builder.
2023-10-18 07:20:01 +02:00
Armin Braun bae6991fb3
Remove ~600 references to SearchResponse in tests (#100966)
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
2023-10-17 15:43:36 +02:00
Armin Braun b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Mark Tozzi 6660503592
Aggs error codes part 1 (#99963)
As part of our effort to increase the supportability of Elasticsearch,
this PR changes many aggregations errors from being 500 class (which is
the default for `AggregationExecutionException`) to 400 class (which is
the default for `IllegalArgumentException`).  All of these cases are
errors which should not be retried, as they are failing directly related
to the content of the request and/or state of the index.

There are definitely more cases where we are returning an incorrect
error code, but for this PR I focused on just changing the low hanging
fruit.
2023-10-04 16:12:34 -04:00
Ignacio Vera bcdd7d5f42
Set ParentAggregationBuilder not to support concurrent execution (#99809) 2023-09-25 09:29:27 +02:00
Alan Woodward 4e1fb3fca5
Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346)
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.

This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.

Relates to #96051
2023-09-13 15:02:22 +01:00
Ryan Ernst 19257125b1
Move transport version constants to TransportVersions (#97990)
Constants for TransportVersion currently live alongeside the class
definition. This has been fine since there was only one set of
constants. However, to support serverless, some constants will need to
be defined elsewhere.

This commit moves the existing constants to a new holder class,
TransportVersions. It is almost entirely mechanical, using IntelliJ move
members. The only non mechanical part was slightly shifting how CURRENT
is found, defining a LATEST in TransportVersions that is automatically
calculated (since we already have it, no need to manually define it).
2023-09-06 15:14:41 -04:00
Ignacio Vera 424a4c6d71
Hide IndexSearcher in AggregatorTestCase (#98924)
Hide the creation of the index searcher from the implementers by changing the signature of 
AggregatorTestCase#searchAndReduce and AggregatorTestCase#createAggregationContext to take
an IndexReader instead of an IndexSearcher.
2023-08-28 16:29:21 +08:00
Matteo Piergiovanni e719057209
Explicit parsing object capabilities of FieldMappers (#98684)
When the subobject property is set to false and we encounter an object 
while parsing we need a way to understand if its FieldMapper is able to 
parse an object. If that's the case we can provide the entire object to 
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.

This has being achieved by adding the `supportsParsingObject()` method 
to the `FieldMapper` class. This method defaults to `false` since the 
majority of FieldMappers do not support parsing objects and is 
overwritten to return `true` by the ones that do support objects.
2023-08-22 10:16:59 +02:00
Armin Braun 63e64ae61b
Cleanup Stream usage in various spots (#97306)
Lots of spots where we did weird things around streams like redundant stream creation, redundant collecting
before adding all the collected elements to another collection or so, redundant streams for joining strings
and using less efficient `Collectors.toList` and in a few cases also incorrectly relying on the result being mutable.
2023-07-03 14:24:57 +02:00
Simon Cooper a873e26cf7
Convert IndexVersion.CURRENT to a method with a pluggable interface (#97132) 2023-06-27 14:47:32 +01:00
Armin Braun 3f8ee82ef8
Use indices admin client shortcut in most integration tests (#96946)
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.
2023-06-20 13:32:59 +02:00
Simon Cooper 71c12262fb
Migrate index created version to IndexVersion (#96066) 2023-06-14 09:43:31 +01:00
Luca Cavanna e5768d9335
Upgrade Lucene to a 9.7.0 snapshot (#96433)
Most relevant changes:

- add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840)
- knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160)
- Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API  apache/lucene#12311)

As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader.

Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>
2023-05-31 10:17:10 +02:00
Ignacio Vera c05181528a
Use DirectoryReader instead of IndexReader in AggregatorTestCase (#95876) 2023-05-08 07:25:10 +02:00
Ignacio Vera 9bbea47899
use #newIndexSearcher in all AggregatorTestCase implementations (#95796) 2023-05-04 11:12:26 +02:00
Armin Braun c41bda9e3a
Dry up remaining verbose index setting building in tests (#95652)
Lasts spots I could easily find via regex.
Follow-up to #95569
2023-04-28 11:18:07 +02:00
Alan Woodward 093e36c875
Introduce DocumentParsingException (#92646)
Document parsing methods currently throw MapperParsingException. This
isn't very helpful, as it doesn't contain any information about where the parse
error happened - it is designed for parsing mappings, which are realised into
java maps before being examined. This commit introduces a new exception
specifically for document parsing that extends XContentException, so that
it reports the current position of the parser as part of its error message.

Fixes #85083
2023-03-31 12:14:19 +01:00
Alan Woodward 131da70321
ValueFetchers now return a StoredFieldsSpec (#94820)
This allows us to be more conservative about what needs to be loaded
when using the fields API, and opens up the possibility of avoiding
using stored fields or source altogether if we can use doc values to
fetch values.

This commit also uses this new information from ValueFetchers to 
more efficiently preload stored fields for the `fields` API, while
still allowing the lazy loading of individual fields if they are asked
for by scripts or runtime fields which cannot be introspected.
2023-03-30 10:46:43 +01:00
Adrien Grand 0c10cef668
Cut over from Field to StringField when applicable. (#94540)
The most recent Lucene update made `StringField` more efficient than `Field`
when indexing simple keywords. This PR cuts over remaining places where we use
`Field` to index keywords to `StringField` instead.
2023-03-23 15:37:51 +01:00
Adrien Grand b56c2df203
Upgrade to lucene-9.6.0-snapshot-f5d1e1c787c. (#94494) 2023-03-16 16:49:54 +01:00
Armin Braun 2819b11523
Dry up setting index settings in internalClusterTests (#90204)
We have this neat utility method for this, lets use it throughout
to save hundreds of LoC and do the setting update in a consistent
way throughout instead of using various variants.
2023-02-28 13:23:49 +01:00
Simon Cooper 4c46ccacaa
Migrate the remaining uses of Version to TransportVersion (#93384)
Remove get/setVersion methods
2023-02-13 09:15:53 +00:00
Alan Woodward c0a3bf7e60
Remove custom NoRewriteMatchNoDocsQuery (#93638)
We added a special NoRewriteMatchNoDocsQuery to get around some
aggressive rewriting that meant match phrase prefix queries wouldn't be
correctly highlighted. Since lucene 9.5, however, the unified highlighter
no longer rewrites queries against an empty searcher, and so this extra
query is now unnecessary.
2023-02-10 09:22:55 +00:00
Adrien Grand af8fccf4b4
Use a combined field to index terms and doc values on keyword fields. (#93579)
Instead of indexing separately a `StringField` and a `SortedSetDocValuesField`,
this commit switches to a single field that indexes both terms and doc values.
On Lucene's nightly benchmarks on the NYC Taxis dataset, a similar change
yielded a ~3% indexing throughput increase.
2023-02-08 14:16:43 +01:00
Simon Cooper c513b2bcc6
Migrate VersionedWriteable & NamedDiff to TransportVersion take 2 (#93242)
Re-apply "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)"

This reverts commit 48f96090dc.
2023-01-26 09:49:08 +00:00
Simon Cooper 48f96090dc Revert "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)"
This reverts commit bef85c66e7.
2023-01-25 16:16:10 +00:00
Simon Cooper bef85c66e7
Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)
InferenceConfig is kept on Version, as that existed before VersionedNamedWriteable came along
2023-01-25 16:03:38 +00:00
Artem Prigoda 2bc7398754
Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106)
Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. 
Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake.
Add `Strings.format` alias in `common.Strings`
2023-01-03 19:28:27 +01:00
Mark Vieira c2eda511de
Add JUnit rule based integration test cluster orchestration framework (#92379)
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
2022-12-21 15:33:46 -08:00
Dimitris Athanasiou f7e0d477f6
Optimize composite agg with leading global ordinal value source (#92197)
When queries are present in a search with a composite agg with a leading
source that is of type `GlobalOrdinalValuesSource` there is an optimization
we can do. In particular, once the composite queue is full, we know the
range of ordinals we are interested in from the source. Thus, we can add
a competitive iterator to the `LeafBucketCollector` that skips documents
that are out of the competitive range.

This commit adds that optimization. In a dataset I have experimented with
that has ~31M docs I observed a 5x improvement in a simple search with
a range query that matched ~28M docs and with `size = 5` over a keyword
field whose cardinality was 200.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2022-12-21 16:25:00 +00:00
Alan Woodward 547c8327b2
Allow FetchSubPhaseProcessors to report their required stored fields (#91269)
Loading of stored fields is currently handled directly in FetchPhase, with
some fairly complex logic examining various bits of the FetchContext to work
out what fields need to be loaded. This is further complicated by synthetic
source, which may have its own stored field requirements.

This commit tries to separate out these concerns a little by adding a new
StoredFieldsSpec record that holds information about which stored fields
need to be loaded. Each FetchSubPhaseProcessor can now report a
StoredFieldsSpec detailing what its requirements are, and these specs can
be merged together, along with requirements from a SourceLoader, to
determine up-front what fields should be loaded by the StoredFieldLoader.
The stored fields themselves are added into the SearchHit by a new
StoredFieldsPhase, which handles alias resolution and value post-
processing. The logic to determine when source should be loaded and
when not, based on the presence of script fields or stored fields, is
moved into FetchContext, which highlights some inconsistencies that
can be fixed in follow-up commits.
2022-11-10 08:40:22 +00:00
Alan Woodward 41ab45a5d9
Report synthetic source status in MapperBuilderContext (#91400)
We currently work out whether or not a mapper should be storing additional
values for synthetic source by looking at the DocumentParserContext. However,
this value does not change for the lifetime of the mapper - it is defined by
metadata on the root mapper and is immutable - and DocumentParserContext
feels like the wrong place for this information as it holds context specific
to the document being parsed.

This commit moves synthetic source status information from DocumentParserContext
to MapperBuilderContext instead. Mappers which need this information retrieve
it at build time and hold it on final fields.
2022-11-08 14:55:16 +00:00
Luca Cavanna 18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
Alan Woodward 0013d46538
Extract Source interface from SourceLookup (#90762)
SourceLookup combines a mutable lookup object that can be advanced
to different documents with access to a document's source. This combination
can make reasoning about where a Source comes from difficult, particularly
in the FetchPhase where the source gets passed around a great deal.

This commit extracts a Source interface from SourceLookup, giving read-only
access to the source, and changes various FetchPhase interfaces to take this
read-only view instead of a full lookup. You can now tell easily if a consumer
of the source is going to try and move it to a different document. As part of this
change we add a new docId parameter to various ValueFetcher methods, as
previously this could be accessed via the SourceLookup.
2022-10-11 19:50:30 +01:00
Mark Tozzi 4a26dda50c
Use the AggTestConfig object in testCase (#90699) 2022-10-06 13:33:57 -04:00
Rene Groeschke 43a0377735
Update forbiddenapis to 3.4 (#90624)
Fix breaking changes to source validation after change in default jdk rule set
2022-10-06 16:52:06 +02:00
Mark Tozzi df27efcae4
Minor Aggregations Test Cleanup (#90530)
* remove unnecessary constructor from AggTestConfig

* deprecate methods that we want to discourage individual tests from invoking

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-09-30 15:21:09 -04:00
Mark Tozzi 15932d5168
Refactor aggregator test case (#90149)
Refactor `AggregatorTestCase` to eliminate many overloads of `searchAndReduce`.  Introduce a parameter object, and default many common arguments.
2022-09-20 14:58:24 -04:00
Alan Woodward aed64a6c76
Add a TSID global ordinal to TimeSeriesIndexSearcher (#90035)
Rather than trying to compare BytesRefs in tsdb-related aggregations, it
will be much quicker if we can use a search-global ordinal to detect when
we have moved to a new TSID. This commit adds such an ordinal to the
aggregation execution context.
2022-09-14 15:32:17 +01:00
Martijn van Groningen 5195cba24e
Fix ParentToChildrenAggregatorTests#testBestDeferringCollectorWithSubAggOfChildrenAggNeedingScores() test failures. (#90052)
The failures reported in #90050 was caused by the fact that just a few
docs were indexed and the string_field had in total just one value in
the index.

The second fix is that due test wrapping of index reader casts in
ValueSource.java line 285 failed. A DirectoryReader is expected there,
which is not the case if maybeWrap is true.

Closes #90050
2022-09-14 18:01:38 +09:30
Martijn van Groningen 9056ff7bc4
Fail when rebuilding scorer in breadth_first mode and query context has changed (#89993)
The children agg changes the query context, when BestBucketsDeferringCollector is
rebuilding scores for the breath first  collect mode then this leads
to erroneous situations:
* A null scorer could be returned, because a segment had no matches.
* Scores for incorrect docids could be reported.

This commit adds checks for both cases and throws runtime errors with a more
actionable error message.

These erroneous situations that could occur when top_hits is nested under
children agg and terms agg with breath_first execution mode.
Possible there are other cases too were this NPE would occur.

Note that this NPE would actually only occur if parent and child docs are in
separate segments, otherwise the scorer would report scores for different documents.
This would trigger an assertion error in tests.

Closes #37650
2022-09-13 08:53:39 +02:00
Nik Everett 79a89790e3
Synthetic source: load text from stored fields (#87480)
Adds support for loading `text` and `keyword` fields that have
`store: true`. We could likely load *any* stored fields, but I
wanted to blaze the trail using something fairly useful.
2022-08-17 10:18:36 -04:00
Jack Conradson 5e0701f026
Add source fallback for keyword fields using operation (#88735)
This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.
2022-07-28 10:34:05 -07:00
Alan Woodward bc8ebbf540
Add FieldDataContext (#88779)
MappedFieldType#fieldDataBuilder() currently takes two parameters, a fully qualified
index name and a supplier for a SearchLookup. We expect to add more parameters here
as we add support for loading fielddata from source. Rather than telescoping the
parameter list, this commit instead introduces a new FieldDataContext carrier object
which will allow us to add to these context parameters more easily.
2022-07-26 14:47:50 +01:00