elasticsearch

Commit Graph

Author	SHA1	Message	Date
David Turner	1eda6ac74b	Extract ESIntegTestCase#prepareSearch (#101175 ) Relates #101172	2023-10-20 06:18:58 -04:00
Alan Woodward	edab22a31c	Consistent scores for multi-term SourceConfirmedTestQuery (#100846 ) SourceConfirmedTestQuery uses a QueryVisitor to collect terms from its inner query to build its internal SimScorer. It is important to hold these terms in a consistent order so that when scores for each term are summed, the order of summation is the same as it would be for the inner query. This commit changes the call to visit to use a LinkedHashSet to ensure that terms are iterated in the order in which they are collected. Fixes #98712	2023-10-16 10:11:10 +01:00
Armin Braun	b7eafce32c	Make some practically static methods static (#97565 ) Another round of automated fixes to this, marking things that can be made static as static. Saves some JIT cycles but also turns some lambdas from capturing to non-capturing and makes the "utilityness" of some classes visible.	2023-10-06 23:37:07 +02:00
Alan Woodward	bb5ed9899b	Implement matches() on SourceConfirmedTextQuery (#100134 ) `match_only_text` does not currently support highlighting via the matches option of the default highlighter. This commit implements matches on the backing query for this field, and also fixes a bug where the field type's value fetcher could hold on to the wrong reference for a source lookup, causing threading errors.	2023-10-04 10:03:48 +01:00
Alan Woodward	4e1fb3fca5	Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346 ) Data-stream mappings require a @timestamp field to be present and configured as a date with a specific set of parameters. The index-wide setting of ignore_malformed can cause problems here if it is set to true, because it needs to be false for the @timestamp field. This commit detects if a set of mappings is configured for a datastream by checking for the presence of a DataStreamTimestampFieldMapper metadata field, and passes that information on during Mapper construction as part of the MapperBuilderContext. DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp field, and if it is, sets ignore_malformed to false. Relates to #96051	2023-09-13 15:02:22 +01:00
Armin Braun	574fb05946	Deduplicate org.apache.lucene.document.FieldType instances across mappers (#99361 ) We mostly have a handful of `FieldType` values here across all mappers and none of them contain attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up subsequent mapper heap savings.	2023-09-08 22:18:35 +02:00
Armin Braun	f1a376c317	Remove CopyTo.Builder (#99368 ) The copyTo builder is really hard to reason about when it comes to mapper merging, because the `reset` method would actually mutate an existing mapper. That seems dangerous and the whole thing is quite inefficient as well. -> this PR just removes it and uses a copy constructor for copy on write, avoiding instance creation on mapper merges here and there and leaving no doubt about these things being immutable.	2023-09-08 13:24:31 -04:00
Ryan Ernst	19257125b1	Move transport version constants to TransportVersions (#97990 ) Constants for TransportVersion currently live alongeside the class definition. This has been fine since there was only one set of constants. However, to support serverless, some constants will need to be defined elsewhere. This commit moves the existing constants to a new holder class, TransportVersions. It is almost entirely mechanical, using IntelliJ move members. The only non mechanical part was slightly shifting how CURRENT is found, defining a LATEST in TransportVersions that is automatically calculated (since we already have it, no need to manually define it).	2023-09-06 15:14:41 -04:00
Matteo Piergiovanni	e719057209	Explicit parsing object capabilities of FieldMappers (#98684 ) When the subobject property is set to false and we encounter an object while parsing we need a way to understand if its FieldMapper is able to parse an object. If that's the case we can provide the entire object to the FieldMapper otherwise its name becomes the part of the dotted field name of each internal value. This has being achieved by adding the `supportsParsingObject()` method to the `FieldMapper` class. This method defaults to `false` since the majority of FieldMappers do not support parsing objects and is overwritten to return `true` by the ones that do support objects.	2023-08-22 10:16:59 +02:00
David Turner	dadaaa8315	AwaitsFix for #98712	2023-08-22 09:11:21 +01:00
Christoph Büscher	207a995fce	Use newSearcher instead of new IndexSearcher in tests where possible (#98110 ) This change swaps test code that directly creates IndexSearcher instances with LuceneTestCase#newSearcher calls that have the advantage of randomly using concurrency and also randomly use assertion wrappers internally. While this doesn't guarantee testing the concurrent code path, it should generally increase the likelihood of doing so.	2023-08-22 10:49:21 +07:00
tmgordeeva	171bcbb3e1	Mapped field types searchable with doc values (#97724 ) * Mapped field types searchable with doc values When using TSDB, time series metrics aren't indexed but do have doc values. Field caps should report those fields as searchable.	2023-08-15 19:29:49 -07:00
eyalkoren	3d36b08d28	Fix `fields` API with `subobjects: false` (#97092 )	2023-07-12 11:35:18 +03:00
Martijn van Groningen	55561588f5	Fix mapping parsing logic to determine synthetic source is active. (#97355 ) Take index mode into account during parsing of the mapping when determining whether source is synthetic Fixes #97320	2023-07-06 06:42:14 -04:00
Simon Cooper	a873e26cf7	Convert IndexVersion.CURRENT to a method with a pluggable interface (#97132 )	2023-06-27 14:47:32 +01:00
Armin Braun	3f8ee82ef8	Use indices admin client shortcut in most integration tests (#96946 ) Replacing the remaining usages that I could automatically replace and a couple that I did by hand in this PR. Also, added the same shortcut to the single node tests to save some duplication there.	2023-06-20 13:32:59 +02:00
Simon Cooper	71c12262fb	Migrate index created version to IndexVersion (#96066 )	2023-06-14 09:43:31 +01:00
Luca Cavanna	e5768d9335	Upgrade Lucene to a 9.7.0 snapshot (#96433 ) Most relevant changes: - add api to allow concurrent query rewrite (GITHUB-11838 Add api to allow concurrent query rewrite apache/lucene#11840) - knn query rewrite (Concurrent rewrite for KnnVectorQuery apache/lucene#12160) - Integrate the incubating Panama Vector API (Integrate the Incubating Panama Vector API apache/lucene#12311) As part of this commit I moved the ES codebase off of overriding or relying on the deprecated rewrite(IndexReader) method in favour of using rewrite(IndexSearcher) instead. For score functions, I went for not breaking existing plugins and create a new IndexSearcher whenever we rewrite a filter, otherwise we'd need to change the ScoreFunction#rewrite signature to take a searcher instead of a reader. Co-authored-by: ChrisHegarty <christopher.hegarty@elastic.co>	2023-05-31 10:17:10 +02:00
Mayya Sharipova	433ce88852	Rank_feature field null_value test and small edits (#96392 ) Correct and add more tests for adding null_value parameter for the rank_feature field. Relates to #95811, closes #95149	2023-05-30 07:33:40 -04:00
Marantidis Kiriakos	e44edcebf0	Add null_value for rank_feature field Closes #95149	2023-05-23 12:52:27 -04:00
Rory Hunter	fe1083f6c5	Upgrade spotless plugin to 6.17.0 (#94994 ) Fixes #82794. Upgrade the spotless plugin, which addresses the issue around formatting `instanceof` expressions. Formatting of statements including lambdas seems to have improved too.	2023-04-04 10:03:32 +01:00
Alan Woodward	093e36c875	Introduce DocumentParsingException (#92646 ) Document parsing methods currently throw MapperParsingException. This isn't very helpful, as it doesn't contain any information about where the parse error happened - it is designed for parsing mappings, which are realised into java maps before being examined. This commit introduces a new exception specifically for document parsing that extends XContentException, so that it reports the current position of the parser as part of its error message. Fixes #85083	2023-03-31 12:14:19 +01:00
Alan Woodward	131da70321	ValueFetchers now return a StoredFieldsSpec (#94820 ) This allows us to be more conservative about what needs to be loaded when using the fields API, and opens up the possibility of avoiding using stored fields or source altogether if we can use doc values to fetch values. This commit also uses this new information from ValueFetchers to more efficiently preload stored fields for the `fields` API, while still allowing the lazy loading of individual fields if they are asked for by scripts or runtime fields which cannot be introspected.	2023-03-30 10:46:43 +01:00
Simon Cooper	56d53da381	Migrate LuceneDocument.getFields(String) to a List (#94830 )	2023-03-29 11:08:36 +01:00
Alan Woodward	35da97214c	Make IndexAnalyzers an interface (#94819 ) IndexAnalyzers is currently always a concrete class wrapping several Maps of NamedAnalyzers. This means that whenever it is used it needs to instantiate all of its component analyzers, making testing much heavier than it needs to be. It also means that things like overriding analysis for legacy indexes is pushed into mapper parameters, rather than being handled in a single place. This commit makes IndexAnalyzers into an interface, with an anonymous concrete implementation that handles reloading and closing for index shards.	2023-03-28 16:07:08 +01:00
Adrien Grand	b56c2df203	Upgrade to lucene-9.6.0-snapshot-f5d1e1c787c. (#94494 )	2023-03-16 16:49:54 +01:00
Benjamin Trent	bc2755f0df	Fix NPE thrown by prefix and regex query in strange scenarios (#94369 ) In certain scenarios, running a MultiTerm query sets a `null` rewrite method. While `null` is usually checked, there are branches in the code where this is not adequately checked. Additionally, `MultiTermQuery#setRewriteMethod` has been deprecated for a while. So, to correct this bug, - Remove calls to `MultiTermQuery#setRewriteMethod` where possible - Always check for `null` rewrite method closes: https://github.com/elastic/elasticsearch/issues/94364	2023-03-08 09:36:17 -05:00
Craig Taverner	e7a2c44bbf	Support position time_series_metric on geo_point fields (#93946 ) Added position time_series_metric: * start creating position time_series_metric * Add yaml tests for queries and aggs * Disallow multi-values for geo_point as ts-metric * Limit running on older versions, some parts of the time-series syntax were not supported on all versions * ScaledFloatFieldMapper does not support POSITION, We should only test it against COUNTER and GAUGE, since it only supports those two metric types * Expand unit tests and allow parsing of dimension. We expand the tests to cover all cases tested in DoubleFieldMapperTests which also tests the behaviour of setting the dimension to true or false, so we enable parsing that for symmetry, but reject `true` as illegal for geo_point. * Add unit tests for position metric multi-values	2023-03-01 12:57:06 +01:00
Alan Woodward	e0fb33a4a5	Remove uses of deprecated LeafReader#document() method (#93984 ) Lucene has deprecated LeafReader#document() in favour of a new LeafReader#storedFields() method. This commit updates all places in elasticsearch that use the deprecated API. Relates to #94005	2023-02-24 12:16:28 +00:00
Martijn van Groningen	df4a8f72c8	Don't treat counter fields in outside of tsdb as counters. (#93800 ) Fields that have the time_series_metric attribute set to counter in non tsdb indices should use number value source type instead of counter value source type. Essentially not handling these fields as counters at search time. Relates to #93539	2023-02-20 08:03:21 +01:00
Christoph Büscher	a3f4f0bb21	Fix rank_features parsing for dots in feature name (#93756 ) Currently, parsing a rank_features field where the key of the feature contains a dot leads to a hard to understand Json parsing errors because we interpret the dot in the feature name to represent a start of a new object. This change allows parsing those dots but throws a more legible IAE exception in case we encounter a dot in a feature name. We shouldn't allow dots in feature names because it can create ambiguity around mappings with more than one 'rank_features' fields that contain dots in their name and names partially overlap.	2023-02-14 16:40:21 +01:00
Benjamin Trent	323a13ac3f	Add `term` query support to rank_features mapped field (#93247 ) This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work. Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.	2023-02-01 13:32:13 -05:00
Adrien Grand	c21ee47610	Switch to Lucene's new IntField/LongField/FloatField/DoubleField. (#93165 ) Lucene introduced new numeric fields that index both points and doc values. This has the same semantics as indexing one field for points and another one for doc values as we did before, but covering both data structures in a single field yielded a speedup in Lucene's nightly benchmarks (see annotation [AH](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_throughput)) which would be interesting to get too. This commit does not switch to factory methods for queries such as `LongField#newRangeQuery` for now, we'll need to look into it in a follow-up.	2023-01-31 16:09:42 -05:00
Simon Cooper	c513b2bcc6	Migrate VersionedWriteable & NamedDiff to TransportVersion take 2 (#93242 ) Re-apply "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076)" This reverts commit `48f96090dc`.	2023-01-26 09:49:08 +00:00
Simon Cooper	48f96090dc	Revert "Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076 )" This reverts commit `bef85c66e7`.	2023-01-25 16:16:10 +00:00
Simon Cooper	bef85c66e7	Migrate VersionedWriteable & NamedDiff to TransportVersion (#93076 ) InferenceConfig is kept on Version, as that existed before VersionedNamedWriteable came along	2023-01-25 16:03:38 +00:00
Martijn van Groningen	07b9d2f296	Don't index counter and gauge scaled_float/unsigned_long fields by default (#92917 ) This is the same change as #92768 but then for `scaled_float` and `unsigned_long` field types. The support for these field types was forgotten.	2023-01-16 13:13:06 +01:00
Alan Woodward	c720cdbbbf	Replace SourceLookup with SourceProvider interface (#91540 ) SourceLookup mixes up several concerns - lazy loading, map access to scripts, different access providers - and duplicates logic (such as that choosing how to apply filtering) that is better handled directly in the Source interface. This commit removes SourceLookup entirely and replaces it with a new SourceProvider interface, with a simple stored fields reader implementation. SearchLookup implements this interface directly, and the fetch phase uses a custom implementation to provide its separately loaded source to fetch-time scripts.	2023-01-12 16:17:46 +00:00
Luca Cavanna	c53becb310	Refactor enum mappings parameter to allow for capital case types (#92548 ) We have a couple of existing enum mappings parameters that specify they enum type in lowercase letters. That is convenient to avoid having to convert enum names back and to uppercase or lowercase depending on what's needed, yet it does not follow coding conventions in that constants should be in capital letters. More importantly, moving the `on_script_error` mapping parameter to a streamlined enum mapping parameter is not possible with lowercase type names because one of its values is `continue` which is a java reserved keyword. It becomes a requirement that the actual value for an enum based mapping parameter can potentially differ from the enum name. In general, the type name will be in capital letters, while the parameter value will be lowercase. With this commit we refactor the enum mappings parameter to provide their types in capital case, while users will keep on providing the corresponding values in lowercase. This only affects how the enum types are represented internally. We can leverage toString for the enum types to do the lowercasing when needed.	2023-01-11 23:28:54 +01:00
Artem Prigoda	2bc7398754	Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106 ) Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake. Add `Strings.format` alias in `common.Strings`	2023-01-03 19:28:27 +01:00
Mark Vieira	c2eda511de	Add JUnit rule based integration test cluster orchestration framework (#92379 ) This commit adds a new test framework for configuring and orchestrating test clusters for both Java and YAML REST testing. This will eventually replace the existing "test-clusters" Gradle plugin and the build-time cluster orchestration.	2022-12-21 15:33:46 -08:00
Nik Everett	9d0b0bad86	Support synthetic _source for _doc_count field (#91465 ) This add synthetic `_source` support for the `_doc_count` field so downsampling should play nicely with sythetic `_source`.	2022-11-10 13:43:33 -05:00
Nik Everett	74d0d19c0f	Synthetic _source: support `field` in many cases (#89950 ) This adds support for the `field` scripting API in many but not all cases. Before this change numbers, dates, and IPs supported the `field` API when running with _source in synthetic mode because they always have doc values. This change adds support for `match_only_text`, `store`d `keyword` fields, and `store`d `text` fields. Two remaining field configurations work with synthetic _source and do not work with `field`: * A `text` field with a sub-`keyword` field that has `doc_values` * A `text` field with a sub-`keyword` field that is `store`d ![image](https://user-images.githubusercontent.com/215970/189217841-4378ed42-e454-42c1-aaf0-6c2c041b29be.png)	2022-11-10 10:44:06 -05:00
Alan Woodward	41ab45a5d9	Report synthetic source status in MapperBuilderContext (#91400 ) We currently work out whether or not a mapper should be storing additional values for synthetic source by looking at the DocumentParserContext. However, this value does not change for the lifetime of the mapper - it is defined by metadata on the root mapper and is immutable - and DocumentParserContext feels like the wrong place for this information as it holds context specific to the document being parsed. This commit moves synthetic source status information from DocumentParserContext to MapperBuilderContext instead. Mappers which need this information retrieve it at build time and hold it on final fields.	2022-11-08 14:55:16 +00:00
Alan Woodward	34a0093928	Add a new values source type returned by metric counters (#90680 ) This commit adds a new values source type to identify counter fields in TSDB indices. Because aggregate values over this field only make sense in specific time series contexts, this should prevent users trying to issue nonsensical aggregations (eg, sum or avg) over counters.	2022-10-31 20:42:56 +00:00
Alan Woodward	0013d46538	Extract Source interface from SourceLookup (#90762 ) SourceLookup combines a mutable lookup object that can be advanced to different documents with access to a document's source. This combination can make reasoning about where a Source comes from difficult, particularly in the FetchPhase where the source gets passed around a great deal. This commit extracts a Source interface from SourceLookup, giving read-only access to the source, and changes various FetchPhase interfaces to take this read-only view instead of a full lookup. You can now tell easily if a consumer of the source is going to try and move it to a different document. As part of this change we add a new docId parameter to various ValueFetcher methods, as previously this could be accessed via the SourceLookup.	2022-10-11 19:50:30 +01:00
Rene Groeschke	43a0377735	Update forbiddenapis to 3.4 (#90624 ) Fix breaking changes to source validation after change in default jdk rule set	2022-10-06 16:52:06 +02:00
Nik Everett	bc49392bfb	Support malformed numbers in synthetic _source (#90428 ) This adds support for `ignore_malformed` to numeric fields other than `scaled_float` in synthetic `_source`. Their values are saved to a stored field and loaded to render the `_source`.	2022-10-04 12:17:30 -04:00
Nik Everett	f4fad2548f	Always support ignore_malformed in the same way (#90565 ) This makes sure that all field types that support `ignore_malfored` do so in the same way. Production changes: * All mapper has an `ignoreMalformed` method that must return `true` if the field accepts the `ignore_malformed` mapping parameter was configured. It defaults to `false` because many fields either don't have a concept of "malformed" value or don't have the ability to ignore malformed values. * Fix the `scaled_float` field to store it's field name in `_ignored` if it ignores any malfored values. This is how all other field mappers work. Test changes: * `MapperTestCase` forces subclasses to declare if their `supportIgnoreMalformed` or not. * If `MapperTestCase` subclasses `supportIgnoreMalfored` they must define some `exampleMalformedValues`. * `MapperTestCase` always grows three new tests: * One that creates a field without setting `ignore_malformed` and verifies that all `exampleMalformedValues` throw expected errors * On that explicitly configured `ignore_malformed` to false and, if `supportIgnoreMalformed` it verifies the errors again. If not `supportIgnoreMalformed` it verifies that the parameter is unknown. * On that explicitly configured `ignore_malformed` to true and, if `supportIgnoreMalformed` it verifies that parsing doesn't produce errors and correctly produces `_ignored`. If not `supportIgnoreMalformed` it verifies that the parameter is unknown. * Moved some subclasesses of `MapperTestCase` from `internalClusterTests` to `tests`. This isn't strictly required but that's the right place for them.	2022-10-03 06:18:02 -04:00
Alan Woodward	de76c62546	Don't use fielddataBuilder to test for aggregatability (#90185 ) The current default implementation of isAggregatable on MappedFieldType tries to construct a field data builder, and returns true or false depending on whether an exception was thrown during construction. This is fairly fragile, and is becoming increasingly so with the introduction of field data contexts, so that a non-aggregatable field type may in fact provide field data to scripts. This commit changes the default implementation to check for docvalues instead of directly building a fielddata builder, and adds checks to MapperTestCase that verify these implementations work correctly.	2022-09-23 13:33:45 +01:00

1 2 3 4 5 ...

270 Commits