elasticsearch

Commit Graph

Author	SHA1	Message	Date
Luca Cavanna	b07feb507d	Percolator to support parsing script score query with params (#101051 ) While dot expansion is disabled when parsing percolator queries at index time, as that would interfere with query parsing, we still use a wrapper parser that is conservative about what methods it supports, assuming that document parsing needs nextToken and not much more. Turns out that when parsing queries instead, we need to support all the XContentParser methods including map, list etc. This commit adds a test for script score query parsing through document parsing via percolator field mapper, and removes the limitations in the wrapper parser when dots expansion is disabled.	2023-10-24 11:03:28 +02:00
David Turner	9794c6e205	Use ESIntegTestCase#prepareSearch more (#101179 ) The refactoring in #101175 only covered all the one-arg call sites. This PR does the rest.	2023-10-20 18:33:00 +01:00
David Turner	1eda6ac74b	Extract ESIntegTestCase#prepareSearch (#101175 ) Relates #101172	2023-10-20 06:18:58 -04:00
Ryan Ernst	8a1db8c6c3	Move index version constants to IndexVersions (#101094 ) Similar to the TransportVersions holder class, IndexVersions is the new place to contain all constants for IndexVersion. This commit moves all existing constants to the new class. It is purely mechanical.	2023-10-19 20:44:51 -04:00
Armin Braun	03ea4bbe6e	Remove more explicit references to SearchResponse in tests (#101052 ) Follow up to #100966 introducing new combined assertion `assertSearchHitsWithoutFailures` to combine no-failure, count, and id assertions into one block.	2023-10-18 20:27:52 +02:00
Armin Braun	bae6991fb3	Remove ~600 references to SearchResponse in tests (#100966 ) We'd like to make `SearchResponse` reference counted and pooled but there are around 6k instances of tests that create a `SearchResponse` local variable that would need to be released manually to avoid leaks in the tests. This does away with about 10% of these spots by adding an override for `assertHitCount` that handles the actual execution of the search request and its release automatically and making use of it in all spots where the `.get()` on the request build could be inlined semi-automatically and in a straight-forward fashion without other code changes.	2023-10-17 15:43:36 +02:00
Armin Braun	b7eafce32c	Make some practically static methods static (#97565 ) Another round of automated fixes to this, marking things that can be made static as static. Saves some JIT cycles but also turns some lambdas from capturing to non-capturing and makes the "utilityness" of some classes visible.	2023-10-06 23:37:07 +02:00
Alan Woodward	4e1fb3fca5	Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346 ) Data-stream mappings require a @timestamp field to be present and configured as a date with a specific set of parameters. The index-wide setting of ignore_malformed can cause problems here if it is set to true, because it needs to be false for the @timestamp field. This commit detects if a set of mappings is configured for a datastream by checking for the presence of a DataStreamTimestampFieldMapper metadata field, and passes that information on during Mapper construction as part of the MapperBuilderContext. DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp field, and if it is, sets ignore_malformed to false. Relates to #96051	2023-09-13 15:02:22 +01:00
Armin Braun	f1a376c317	Remove CopyTo.Builder (#99368 ) The copyTo builder is really hard to reason about when it comes to mapper merging, because the `reset` method would actually mutate an existing mapper. That seems dangerous and the whole thing is quite inefficient as well. -> this PR just removes it and uses a copy constructor for copy on write, avoiding instance creation on mapper merges here and there and leaving no doubt about these things being immutable.	2023-09-08 13:24:31 -04:00
Ryan Ernst	19257125b1	Move transport version constants to TransportVersions (#97990 ) Constants for TransportVersion currently live alongeside the class definition. This has been fine since there was only one set of constants. However, to support serverless, some constants will need to be defined elsewhere. This commit moves the existing constants to a new holder class, TransportVersions. It is almost entirely mechanical, using IntelliJ move members. The only non mechanical part was slightly shifting how CURRENT is found, defining a LATEST in TransportVersions that is automatically calculated (since we already have it, no need to manually define it).	2023-09-06 15:14:41 -04:00
David Turner	1e9c7f1d95	Align collection de/serialization API naming (#99150 ) The `StreamOutput` and `StreamInput` APIs are designed so that code which serializes objects to the transport protocol aligns closely with the corresponding deserialization code. However today `StreamOutput#writeCollection` pairs up with a variety of methods on `StreamInput`, including `readList`, `readSet`, and so on. These methods are not obviously compatible with `writeCollection` unless you look at the implementation, and that makes verifying transport protocol code harder than it needs to be. This commit renames these methods to `readCollectionAsList`, `readCollectionAsSet`, and so on, to clarify that they are compatible with `writeCollection`. Relates https://github.com/elastic/elasticsearch/pull/98971#issuecomment-1697289815	2023-09-04 06:46:54 -04:00
Benjamin Trent	d09cb767a9	Fix percolator query for stored queries that expand on wildcard field names (#98878 ) An optimization introduced in: https://github.com/elastic/elasticsearch/pull/81985 changed percolator query behavior. Users can specify a percolator query which expands fields based on a wildcard pattern. Just one example is `simple_query_string`, which allows field names like `"text_*"`. The user expects that this field name will expand to relevant mapped fields (e.g. "text_foo"). However, if there are no documents indexed in those fields at the time when the percolator query is indexed, it doesn't expand to the relevant fields. Additionally at query time, we may skip expanding fields and not match the relevant mapped fields if they are considered "empty" (e.g. has no values in the shard). We should instead allow expansion by indicating that the field may exist in the shard. closes: https://github.com/elastic/elasticsearch/issues/98819	2023-08-28 09:19:28 -04:00
Matteo Piergiovanni	e719057209	Explicit parsing object capabilities of FieldMappers (#98684 ) When the subobject property is set to false and we encounter an object while parsing we need a way to understand if its FieldMapper is able to parse an object. If that's the case we can provide the entire object to the FieldMapper otherwise its name becomes the part of the dotted field name of each internal value. This has being achieved by adding the `supportsParsingObject()` method to the `FieldMapper` class. This method defaults to `false` since the majority of FieldMappers do not support parsing objects and is overwritten to return `true` by the ones that do support objects.	2023-08-22 10:16:59 +02:00
Christoph Büscher	207a995fce	Use newSearcher instead of new IndexSearcher in tests where possible (#98110 ) This change swaps test code that directly creates IndexSearcher instances with LuceneTestCase#newSearcher calls that have the advantage of randomly using concurrency and also randomly use assertion wrappers internally. While this doesn't guarantee testing the concurrent code path, it should generally increase the likelihood of doing so.	2023-08-22 10:49:21 +07:00
Armin Braun	63e64ae61b	Cleanup Stream usage in various spots (#97306 ) Lots of spots where we did weird things around streams like redundant stream creation, redundant collecting before adding all the collected elements to another collection or so, redundant streams for joining strings and using less efficient `Collectors.toList` and in a few cases also incorrectly relying on the result being mutable.	2023-07-03 14:24:57 +02:00
Simon Cooper	a873e26cf7	Convert IndexVersion.CURRENT to a method with a pluggable interface (#97132 )	2023-06-27 14:47:32 +01:00
Armin Braun	dd7d381922	Dry up getting cluster admin client in tests (#96952 ) Drying this up further and adding the same short-cut for single node tests. Dealing with most of the spots that I could grab via automatic refactorings.	2023-06-22 14:27:23 +02:00
Armin Braun	3f8ee82ef8	Use indices admin client shortcut in most integration tests (#96946 ) Replacing the remaining usages that I could automatically replace and a couple that I did by hand in this PR. Also, added the same shortcut to the single node tests to save some duplication there.	2023-06-20 13:32:59 +02:00
Simon Cooper	71c12262fb	Migrate index created version to IndexVersion (#96066 )	2023-06-14 09:43:31 +01:00
Ryan Ernst	164e97e2ca	Encapsulate TransportVersion.CURRENT (#96681 ) This commit changes access to the latest TransportVersion constant to use a static method instead of a public static field. By encapsulating the field we will be able to (in a followup) lazily determine what the latest is, outside of clinit.	2023-06-13 18:44:15 -04:00
Armin Braun	414eda7b80	Cheaper ActionListener.wrap when error handler is the listener (#96575 ) Motivated by looking into allocations of listeners in detail for shared cache benchmarking. Wrapping a listener and using `listener::onFailure` as the failure callback means that we have a reference to the listener from both the failure and the response handler. If we use the approach used by the `.deleteGate*` methods, we can often save allocating a response handler lambda or at least make the response handler cheaper. We also save allocating the failure handler lambda.	2023-06-06 11:42:39 +02:00
Simon Cooper	f49e7f78ee	Add helper functions allowing lambdas to be used to modify junit matchers (#95078 ) Add junit matchers allowing other matchers to be transformed using a function. This can make checking properties of lists/arrays a lot more fluent using nested matchers, rather than declaratively checking individual items. These replace the custom `ElasticsearchMatchers` with more generic ones. As a basic example, it allows you to turn this: assertThat(list.get(0).getName(), equalTo("foo")); assertThat(list.get(1).getName(), equalTo("bar")); assertThat(list.get(2).getName(), equalTo("quux")); into this: assertThat(list, transformedItems(Item::getName, contains("foo", "bar", "quux"))); Doing this 'properly' without these helpers requires defining your own matchers, which is very cumbersome. I've applied the new methods to `ElasticsearchAssertions` and a few other classes to show various use cases.	2023-06-01 06:10:03 -04:00
Luca Cavanna	e5768d9335	Upgrade Lucene to a 9.7.0 snapshot (#96433 ) Most relevant changes: - add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840) - knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160) - Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API apache/lucene#12311) As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader. Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>	2023-05-31 10:17:10 +02:00
Simon Cooper	9fa7612d2d	Use separate transportversion for percolator query serialization (#94517 ) This changes the serialization format for queries - when the index version is >=8.8.0, it serializes the actual transport version used into the stream. For BwC with old query formats, it uses the mapped TransportVersion for the index version. This can be modified later if needed to re-interpret the vint used to store TransportVersion to something else, allowing the format to be further modified if necessary.	2023-04-05 13:13:14 +01:00
Rory Hunter	fe1083f6c5	Upgrade spotless plugin to 6.17.0 (#94994 ) Fixes #82794. Upgrade the spotless plugin, which addresses the issue around formatting `instanceof` expressions. Formatting of statements including lambdas seems to have improved too.	2023-04-04 10:03:32 +01:00
Alan Woodward	093e36c875	Introduce DocumentParsingException (#92646 ) Document parsing methods currently throw MapperParsingException. This isn't very helpful, as it doesn't contain any information about where the parse error happened - it is designed for parsing mappings, which are realised into java maps before being examined. This commit introduces a new exception specifically for document parsing that extends XContentException, so that it reports the current position of the parser as part of its error message. Fixes #85083	2023-03-31 12:14:19 +01:00
Alan Woodward	d5e93a8cdd	Use storedFieldsSpec to load stored fields for highlighting (#91841 ) Since #91269, fetch phase subprocessors can report any stored fields that they need back to the top-level FetchPhase so that all stored fields can be loaded up front. This commit switches the unified and plain highlighters to use this functionality so that highlighting does not need to open stored field readers twice.	2023-03-31 09:34:05 +01:00
Simon Cooper	56d53da381	Migrate LuceneDocument.getFields(String) to a List (#94830 )	2023-03-29 11:08:36 +01:00
Adrien Grand	0c10cef668	Cut over from Field to StringField when applicable. (#94540 ) The most recent Lucene update made `StringField` more efficient than `Field` when indexing simple keywords. This PR cuts over remaining places where we use `Field` to index keywords to `StringField` instead.	2023-03-23 15:37:51 +01:00
Adrien Grand	e093dda177	Index sequence numbers via a single Lucene field. (#94504 ) Following a similar approach yielded indexing speedups in Lucene benchmarks. Furthermore, it arguably makes things simpler.	2023-03-16 16:46:33 +01:00
Armin Braun	17932263c6	Dry up additional spots of setting cluster settings in internal cluster tests (#94243 ) Follow up to #94213 dealing with the remaining spots. This also moves some use of deprecated transient settings to persisted settings and deprecates the request builder methods for transient settings to hopefully prevent more use of transients.	2023-03-09 20:41:25 +01:00
Luca Cavanna	42835ab4af	Simplify supported queries check in PercolatorFieldMapper (#94306 ) The percolator parses the query into a QueryBuilder, and then manually walks the query tree to ensure that the query is supported. This requires instanceof checks that is aware of all the compound queries and may easily get outdated. With #90425 we can instead rely on checking the query validity directly while parsing, by providing a consumer that gets notified for each inner query that gets parsed.	2023-03-06 11:58:25 +01:00
Alan Woodward	639eab0549	Remove force_source option for highlighting (#93193 ) This was only needed because the percolator uses a MemoryIndex which did not support stored fields, and so when it ran a highlighting phase it needed to force it to read from source. MemoryIndex added stored fields support in lucene 9.5, so we can remove this internal parameter. The parameter remains available, but deprecated, via the rest layer, and no longer has any effect.	2023-02-21 09:51:28 +00:00
Simon Cooper	4c46ccacaa	Migrate the remaining uses of Version to TransportVersion (#93384 ) Remove get/setVersion methods	2023-02-13 09:15:53 +00:00
Simon Cooper	f086dd18dd	Migrate misc packages to TransportVersion (#93272 )	2023-01-31 11:24:32 +00:00
Simon Cooper	c513b2bcc6	Migrate VersionedWriteable & NamedDiff to TransportVersion take 2 (#93242 ) Re-apply "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)" This reverts commit `48f96090dc`.	2023-01-26 09:49:08 +00:00
Simon Cooper	48f96090dc	Revert "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076 )" This reverts commit `bef85c66e7`.	2023-01-25 16:16:10 +00:00
Simon Cooper	bef85c66e7	Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076 ) InferenceConfig is kept on Version, as that existed before VersionedNamedWriteable came along	2023-01-25 16:03:38 +00:00
Artem Prigoda	2bc7398754	Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106 ) Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake. Add `Strings.format` alias in `common.Strings`	2023-01-03 19:28:27 +01:00
Mark Vieira	c2eda511de	Add JUnit rule based integration test cluster orchestration framework (#92379 ) This commit adds a new test framework for configuring and orchestrating test clusters for both Java and YAML REST testing. This will eventually replace the existing "test-clusters" Gradle plugin and the build-time cluster orchestration.	2022-12-21 15:33:46 -08:00
Alan Woodward	547c8327b2	Allow FetchSubPhaseProcessors to report their required stored fields (#91269 ) Loading of stored fields is currently handled directly in FetchPhase, with some fairly complex logic examining various bits of the FetchContext to work out what fields need to be loaded. This is further complicated by synthetic source, which may have its own stored field requirements. This commit tries to separate out these concerns a little by adding a new StoredFieldsSpec record that holds information about which stored fields need to be loaded. Each FetchSubPhaseProcessor can now report a StoredFieldsSpec detailing what its requirements are, and these specs can be merged together, along with requirements from a SourceLoader, to determine up-front what fields should be loaded by the StoredFieldLoader. The stored fields themselves are added into the SearchHit by a new StoredFieldsPhase, which handles alias resolution and value post- processing. The logic to determine when source should be loaded and when not, based on the presence of script fields or stored fields, is moved into FetchContext, which highlights some inconsistencies that can be fixed in follow-up commits.	2022-11-10 08:40:22 +00:00
Alan Woodward	41ab45a5d9	Report synthetic source status in MapperBuilderContext (#91400 ) We currently work out whether or not a mapper should be storing additional values for synthetic source by looking at the DocumentParserContext. However, this value does not change for the lifetime of the mapper - it is defined by metadata on the root mapper and is immutable - and DocumentParserContext feels like the wrong place for this information as it holds context specific to the document being parsed. This commit moves synthetic source status information from DocumentParserContext to MapperBuilderContext instead. Mappers which need this information retrieve it at build time and hold it on final fields.	2022-11-08 14:55:16 +00:00
Xiao	a39e4a18ac	Cleanup: Remove unused semicolon (#91277 )	2022-11-08 10:32:50 +00:00
Luca Cavanna	d37cae2fa4	Consolidate field name validation when parsing mappings and documents (#91328 ) #91043 surfaced some inconsistencies in field names validation between mapping parsing and document parsing. This commit centralizes the validation of field names when parsing mappings to a single place, and attempts to address some of the inconsistencies. - field names that contain only whitespaces are no longer accepted in mappings. It was previously possible to map a field containing only whitespaces but a document containing such a field would be rejected. We start rejecting mappings with fields that contain only whitespaces for indices that are created from 8.6 on, just in case existing indices contain such fields. This is true also for dotted fields like top. .foo when subobjects are enabled. - A clear error message is thrown when mappings hold fields with names made of dots only. An ArrayIndexOutOfBoundsException was thrown before - The error thrown when a field name is empty is now unified with that thrown when an empty field name is provided as part of a document (field name cannot be an empty string) - When parsing documents (with subobjects set to false), distinguish between the error thrown when a field name is empty and that thrown when a field name is made of whitespaces only - When parsing documents (with subobjects set to false), accept field names that are made of dots only (these are already accepted in mappings), effectively reverts #90950	2022-11-07 13:29:29 +01:00
Alan Woodward	ba7a219ac0	Encapsulate source filtering (#91127 ) We have two implementations of source filtering, one based on Map filtering and used by SourceLookup, and one based on jackson stream filtering used in Get and SourceFieldMapper. There are cases when stream filtering could be usefully applied to source in the fetch phase, for example if the source is not being used as a Map by any other subphase; and correspondingly if a source has already been parsed to a Map then map filtering will generally be more efficient than stream filtering that ends up re-parsing the bytes. This commit encapsulates all of this filtering logic into a single SourceFilter class, which can be passed to the filter method on Source. Different Source implementations can choose to use map or stream filtering depending on whether or not they have map or bytes representations available.	2022-10-31 16:48:56 +00:00
Yannick Welsch	ddbd7a8263	Return docs when using nested mappings in archive indices (#90585 ) Archive indices < 6.1 do not work together with nested fields. While the documentation makes sure not to claim support for nested fields, see https://www.elastic.co/guide/en/elasticsearch/reference/current/archive-indices.html#archive-indices-supported-field-types, it leads to a situation where the import still works yet none of the documents at all are returned by any query (not even match_all). This is because indices before 6.1 did not have primary terms, but the default search context in ES 8 adds a FieldExistsQuery(SeqNoFieldMapper.PRIMARY_TERM_NAME) filter to the query: `f56126089c/server/src/main/java/org/elasticsearch/search/DefaultSearchContext.java (L284)` This PR fixes the issue by adding basic support for nested fields so that it at least allows queries that are not leveraging the nested documents to work (i.e. allow extracting the documents e.g. to be reindexed). Closes #90523	2022-10-20 15:16:13 +02:00
Alan Woodward	350338fd39	Replace SourceLookup on HitContext with an immutable Source (#90816 ) Rather than creating a new SourceLookup for each HitContext, and then setting a source provider on it after the fact, we instead just take a Source as a constructor argument. This commit also adds three Source implementations, `fromBytes` and `fromMap` to hold pre-loaded data, and `lazyLoading` which will load the source only if asked for, and tidies up FetchSourcePhase to use them.	2022-10-20 11:26:49 +01:00
Luca Cavanna	18942d5b11	Enhance nested depth tracking when parsing queries (#90425 ) When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested. This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types. The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach. One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query. In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 . Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.	2022-10-12 15:15:06 +02:00
Alan Woodward	0013d46538	Extract Source interface from SourceLookup (#90762 ) SourceLookup combines a mutable lookup object that can be advanced to different documents with access to a document's source. This combination can make reasoning about where a Source comes from difficult, particularly in the FetchPhase where the source gets passed around a great deal. This commit extracts a Source interface from SourceLookup, giving read-only access to the source, and changes various FetchPhase interfaces to take this read-only view instead of a full lookup. You can now tell easily if a consumer of the source is going to try and move it to a different document. As part of this change we add a new docId parameter to various ValueFetcher methods, as previously this could be accessed via the SourceLookup.	2022-10-11 19:50:30 +01:00
Armin Braun	4c8847bdd4	Use immutable map and list readers instead of wrappers in some obvious spots (#90743 ) Fixed a couple of issues around duplicate empty + unmodifiable collection wrappers polluting heaps lately so this tries to bulk fix all obvious spots where we can just more efficiently read the immutable collection directly.	2022-10-10 11:24:15 +02:00

1 2 3 4 5 ...

464 Commits