elasticsearch

Commit Graph

Author	SHA1	Message	Date
Jim Ferenczi	3b62de9e78	unsigned longs should be compatible with index sorting (#75599 ) This change allows the type `unsigned_long` to be used in the index sort specification. The internal representation uses plain longs so we can rely on the existing support for that type.	2021-07-26 16:45:47 +02:00
Luca Cavanna	c6641bf00c	Rename ParseContext to DocumentParserContext (#74963 ) ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings. To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.	2021-07-06 09:15:59 -04:00
Rory Hunter	a5d2251064	Order imports when reformatting (#74059 ) Change the formatter config to sort / order imports, and reformat the codebase. We already had a config file for Eclipse users, so Spotless now uses that. The "Eclipse Code Formatter" plugin ought to be able to use this file as well for import ordering, but in my experiments the results were poor. Instead, use IntelliJ's `.editorconfig` support to configure import ordering. I've also added a config file for the formatter plugin. Other changes: * I've quietly enabled the `toggleOnOff` option for Spotless. It was already possible to disable formatting for sections using the markers for docs snippets, so enabling this option just accepts this reality and makes it possible via `formatter:off` and `formatter:on` without the restrictions around line length. It should still only be used as a very last resort and with good reason. * I've removed mention of the `paddedCell` option from the contributing guide, since I haven't had to use that option for a very long time. I moved the docs to the spotless config.	2021-06-16 09:22:22 +01:00
Rene Groeschke	e609e07cfe	Remove internal build logic from public build tool plugins (#72470 ) Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic This includes a refactoring of ElasticsearchDistribution to handle types better in a way we can differentiate between supported Elasticsearch Distribution types supported in TestCkustersPlugin and types only supported in internal plugins. It also introduces a set of internal versions of public plugins. As part of this we also generate the plugin descriptors now. As a follow up on this we can actually move these public used classes into an extra project (declared as included build) We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase	2021-05-06 14:02:35 +02:00
Alan Woodward	b27eaa38dc	Remove 'external values', and replace with swapped out XContentParsers (#72203 ) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063	2021-04-29 09:17:18 +01:00
Alan Woodward	e002aa809b	Make FieldNamesFieldMapper responsible for adding its own doc fields (#71929 ) The FieldNamesFieldMapper is a metadata mapper defining a field that can be used for exists queries if a mapper does not use doc values or norms. Currently, data is added to it via a special method on FieldMapper that pulls the metadata mapper from a mapping lookup, checks to see if it is enabled, and then adds the relevant value to a lucene document. This is one of only two places that pulls a metadata mapper from the MappingLookup, and it would be nice to remove this method. This commit refactors field name handling by instead storing the names of fields to index in the fieldnames field in a set on the ParseContext, and then building the field itself in FieldNamesFieldMapper.postParse(). This means that all of the responsibility for enabling indexing, etc, is handled within the metadata mapper itself.	2021-04-27 16:03:46 +01:00
Alan Woodward	1653f2fe91	Add script parameter to long and double field mappers (#69531 ) This commit adds a script parameter to long and double fields that makes it possible to calculate a value for these fields at index time. It uses the same script context as the equivalent runtime fields, and allows for multiple index-time scripted fields to cross-refer while still checking for indirection loops.	2021-03-31 11:14:11 +01:00
Nik Everett	91c700bd99	Super randomized tests for fetch fields API (#70278 ) We've had a few bugs in the fields API where is doesn't behave like we'd expect. Typically this happens because it isn't obvious what we expct. So we'll try and use randomized testing to ferret out what we want. This adds a test for most field types that asserts that `fields` works similarly to `docvalues_fields`. We expect this to be true for most fields. It does so by forcing all subclasses of `MapperTestCase` to define a method that makes random values. It declares a few other hooks that subclasses can override to further randomize the test. We skip the test for a few field types that don't have doc values: * `annotated_text` * `completion` * `search_as_you_type` * `text` We should come up with some way to test these without doc values, even if it isn't as nice. But that is a problem for another time, I think. We skip the test for a few more types just because I wanted to cut this PR in half so we could get to reviewing it earlier. We'll get to those in a follow up change. I've filed a few bugs for things that are inconsistent with `docvalues_fields`. Typically that means that we have to limit the random values that we generate to those that do round trip properly.	2021-03-24 14:16:27 -04:00
Alan Woodward	139ff8657a	Require `meta` field for MappedFieldType to be non-null (#70145 ) The transport action for FieldCapabilities assumes the meta field for a MappedFieldType is traversable. This commit adds a requirement to MappedFieldType itself to ensure that it is implemented for all subtypes.	2021-03-09 15:40:03 +00:00
Rene Groeschke	bdf229a148	Introduce Internal Test Artifact Plugin (#68766 ) This reduces the ceremony declaring test artifacts for a project. It also solves an issue with usage of deprecated testRuntime that testArtifacts extendsFrom which seems not required at all and would have broke with Gradle 7.0 anyhow Test artifact resolution is now variant aware which allows us a more adequate compile and runtime classpath for the consuming projects. We also Introduce a convention method in the elasticsearch build to declare test artifact dependencies in an easy way close to how its done by the gradle build in test fixture plugin. Furthermore we cleaned up some inconsistent test dependencies declarations when relying on a project and on its test artifacts	2021-02-16 14:36:17 +01:00
Alan Woodward	dbff7bea37	Rename DocValueFetcher.Leaf to FormattedDocValues (#68818 ) Also moves it to a top-level interface in fielddata. It is not only used by DocValueFetcher any more, and Leaf does not really describe what it does or what it provides.	2021-02-15 10:03:25 +00:00
Rene Groeschke	5dfa6f46ac	Remove deprecated usage of default configuration (#68575 ) This has been deprecated in gradle before but we havnt been warned. Gradle 7.0 will likely introduce a change in behaviour here that we should fix the usage of this configuration upfront. See https://github.com/gradle/gradle/issues/16027 for further information about the change in Gradle 7.0	2021-02-07 12:08:02 +01:00
Mark Vieira	a92a647b9f	Update sources with new SSPL+Elastic-2.0 license headers As per the new licensing change for Elasticsearch and Kibana this commit moves existing Apache 2.0 licensed source code to the new dual license SSPL+Elastic license 2.0. In addition, existing x-pack code now uses the new version 2.0 of the Elastic license. Full changes include: - Updating LICENSE and NOTICE files throughout the code base, as well as those packaged in our published artifacts - Update IDE integration to now use the new license header on newly created source files - Remove references to the "OSS" distribution from our documentation - Update build time verification checks to no longer allow Apache 2.0 license header in Elasticsearch source code - Replace all existing Apache 2.0 license headers for non-xpack code with updated header (vendored code with Apache 2.0 headers obviously remains the same). - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.	2021-02-02 16:10:53 -08:00
gf2121	92f85981a7	Avoid duplicate serialization for TermsQueryBuilder (#67223 ) Avoid duplicate serialization for TermsQuery.	2021-01-18 09:04:29 +01:00
Julie Tibshirani	5852fbedf5	Rename QueryShardContext -> SearchExecutionContext. (#67490 ) We decided to rename `QueryShardContext` to clarify that it supports all parts of search request execution. Before there was confusion over whether it should only be used for building queries, or maybe only used in the query phase. This PR also updates the javadocs. Closes #64740.	2021-01-14 09:11:59 -08:00
Julie Tibshirani	f4a462d05e	Simplify how source is passed to fetch subphases. (#65292 ) This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy: * For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`. * Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting. There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used. Addresses #62511.	2020-11-20 14:09:41 -08:00
Nik Everett	7ceed1369d	Speed up date_histogram without children (#63643 ) This speeds up `date_histogram` aggregations without a parent or children. This is quite common - it's the aggregation that Kibana's Discover uses all over the place. Also, we hope to be able to use the same mechanism to speed aggs with children one day, but that day isn't today. The kind of speedup we're seeing is fairly substantial in many cases: ``` \| \| \| before \| after \| \| \| 90th percentile service time \| date_histogram_calendar_interval \| 9266.07 \| 1376.13 \| ms \| \| 90th percentile service time \| date_histogram_calendar_interval_with_tz \| 9217.21 \| 1372.67 \| ms \| \| 90th percentile service time \| date_histogram_fixed_interval \| 8817.36 \| 1312.67 \| ms \| \| 90th percentile service time \| date_histogram_fixed_interval_with_tz \| 8801.71 \| 1311.69 \| ms \| <-- discover's agg \| 90th percentile service time \| date_histogram_fixed_interval_with_metrics \| 44660.2 \| 43789.5 \| ms \| ``` This uses the work we did in #61467 to precompute the rounding points for a `date_histogram`. Now, when we know the rounding points we execute the `date_histogram` as a `range` aggregation. This is nice for two reasons: 1. We can further rewrite the `range` aggregation (see below) 2. We don't need to allocate a hash to convert rounding points to ordinals. 3. We can send precise cardinality estimates to sub-aggs. Points 2 and 3 above are nice, but most of the speed difference comes from point 1. Specifically, we now look into executing `range` aggregations as a `filters` aggregation. Normally the `filters` aggregation is quite slow but when it doesn't have a parent or any children then we can execute it "filter by filter" which is significantly faster. So fast, in fact, that it is faster than the original `date_histogram`. The `range` aggregation is fairly careful in how it rewrites, giving up on the `filters` aggregation if it won't collect "filter by filter" and falling back to its original execution mechanism. So an aggregation like this: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "date_histogram": { "field": "dropoff_datetime", "fixed_interval": "60d", "time_zone": "America/New_York" } } } } ``` is executed like: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "range": { "field": "dropoff_datetime", "ranges": [ {"from": 1415250000000, "to": 1420434000000}, {"from": 1420434000000, "to": 1425618000000}, {"from": 1425618000000, "to": 1430798400000}, {"from": 1430798400000, "to": 1435982400000}, {"from": 1435982400000, "to": 1441166400000}, {"from": 1441166400000, "to": 1446350400000}, {"from": 1446350400000, "to": 1451538000000}, {"from": 1451538000000} ] } } } } ``` Which in turn is executed like this: ``` POST _search { "size": 0, "query": { "range": { "dropoff_datetime": { "gte": "2015-01-01 00:00:00", "lt": "2016-01-01 00:00:00" } } }, "aggs": { "dropoffs_over_time": { "filters": { "filters": { "1": {"range": {"dropoff_datetime": {"gte": "2014-12-30 00:00:00", "lt": "2015-01-05 05:00:00"}}}, "2": {"range": {"dropoff_datetime": {"gte": "2015-01-05 05:00:00", "lt": "2015-03-06 05:00:00"}}}, "3": {"range": {"dropoff_datetime": {"gte": "2015-03-06 00:00:00", "lt": "2015-05-05 00:00:00"}}}, "4": {"range": {"dropoff_datetime": {"gte": "2015-05-05 00:00:00", "lt": "2015-07-04 00:00:00"}}}, "5": {"range": {"dropoff_datetime": {"gte": "2015-07-04 00:00:00", "lt": "2015-09-02 00:00:00"}}}, "6": {"range": {"dropoff_datetime": {"gte": "2015-09-02 00:00:00", "lt": "2015-11-01 00:00:00"}}}, "7": {"range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00", "lt": "2015-12-31 00:00:00"}}}, "8": {"range": {"dropoff_datetime": {"gte": "2015-12-31 00:00:00"}}} } } } } } ``` And that is faster because we can execute it "filter by filter". Finally, notice the `range` query filtering the data. That is required for the data set that I'm using for testing. The "filter by filter" collection mechanism for the `filters` agg needs special case handling when the query is a `range` query and the filter is a `range` query and they are both on the same field. That special case handling "merges" the range query. Without it "filter by filter" collection is substantially slower. Its still quite a bit quicker than the standard `filter` collection, but not nearly as fast as it could be.	2020-11-09 14:20:25 -05:00
Alan Woodward	0fd70ae383	Remove Mapper.BuilderContext (#64625 ) Mapper.BuilderContext is a simple wrapper around two objects, some IndexSettings and a ContentPath. The IndexSettings are the same as those provided in the ParserContext, so we can simplify things here by removing them and just passing ContentPath directly to Mapper.Builder#build()	2020-11-05 10:48:39 +00:00
Luca Cavanna	344ad33a16	Remove ValueFetcher depedendency from MapperService (#64524 ) The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService. Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.	2020-11-04 12:08:34 +01:00
Mayya Sharipova	0ffbcd3b3c	Disable using unsigned_long in scripts (#64523 ) Relates to #64361	2020-11-03 14:20:46 -05:00
Alan Woodward	a5168572d5	Collapse ParametrizedFieldMapper into FieldMapper (#64365 ) Now that all our FieldMapper implementations extend ParametrizedFieldMapper, we can collapse the two classes together, and remove a load of cruft from FieldMapper that is unused. In particular: * we no longer need the lucene FieldType field on FieldMapper * we no longer use clone() for merging, so we can remove it from all impls * the serialization code in FieldMapper that assumes we're looking at text fields can go	2020-11-02 15:07:52 +00:00
Alan Woodward	4191c72baf	Distinguish between simple matches with and without the terms index (#63945 ) We currently use TextSearchInfo to let query parsers know when a field will support match queries. Some field types (numeric, constant, range) can produce simple match queries that don't use the terms index, and it is useful to distinguish between these fields on the one hand and keyword/text-type fields on the other. In particular, the SignificantTextAggregation only works on fields that have indexed terms, but there is currently no efficient way to see this at search time and so the factory falls back on checking to see if an index analyzer has been defined, with the result that some nonsensical field types are permitted. This commit adds a new static TextSearchInfo implementation called SIMPLE_MATCH_WITHOUT_TERMS that can be returned by field types with no corresponding terms index. It changes significant text to check for this rather than for the presence of an index analyzer. This is a breaking change, in that the significant text agg will now throw an error up-front if you try and apply it to a numeric field, whereas before you would get an empty result.	2020-10-27 12:07:51 +00:00
Mayya Sharipova	072da36edc	Fix max/min aggs for unsigned_long (#63904 ) Max and min aggs were producing wrong results for unsigned_long field if field was indexed. If field is indexed for max/min aggs instead of field data, we use values from indexed Points, values of which are derived using method pointReaderIfPossible. Before UnsignedLongFieldType#pointReaderIfPossible was incorrectly producing values, as it failed to shift them back to original values. This patch fixes method pointReaderIfPossible to produce correct original values. Relates to #60050	2020-10-19 15:52:27 -04:00
Mayya Sharipova	a11f2ae9f2	Modify unsigned_long tests for broader ranges (#63782 ) UnsignedLongTests for the range agg was using very specific intervals that double type can not distinguish due to lack of precision: 9.223372036854776000E18 == 9.223372036854775807E18 returns true If we add the corresponding range query test, it will return different number of hits than the range agg, as range query unlike range agg doesn't convert valued to double type, and hence more precise. This patch make broader ranges for the range agg test (so values converted to doubles don't loose precision), and hence corresponding range query will return the same number of hits. Relates to #60050	2020-10-16 12:08:32 -04:00
Mayya Sharipova	3d3837da14	Enable collapse on unsigned_long field (#63495 ) Collapse was not working on unsigned_long field, as collapsing was enabled only on KeywordFieldType and NumberFieldType. This introduces a new method `collapseType` to MappedFieldType, that is checked to decide if collapsing should be enabled. Relates to #60050	2020-10-16 10:06:35 -04:00
Luca Cavanna	f491422e1e	Ensure field types consistency on supporting text queries (#63487 ) Some supported field types don't support term queries, and throw exception in their termQuery method. That exception is either an IllegalArgumentException or a QueryShardException. There is logic in MatchQuery that skips the field or not depending on the exception that is thrown. Also, such field types should hold a TextSearchInfo.NONE while that is not always the case. With this commit we make the following changes: - streamline using TextSearchInfo.NONE in all field types that don't support text queries - standardize the exception being thrown when a field type does not support term queries to be IllegalArgumentException. Note that this is not a breaking change as both exceptions previously returned translated to 400 status code. - Adapt the MatchQuery logic to skip fields that don't support term queries. There is no need to call termQuery passing an empty string and catch exceptions potentially thrown. We can rather check the TextSearchInfo which tells already whether the field supports text queries or not. - add a test method to MapperTestCase that verifies the consistency of a field type by verifying that it is not searchable whenever it uses TextSearchInfo.NONE, while it is otherwise. This is what triggered all of the above changes.	2020-10-13 11:05:43 +02:00
Julie Tibshirani	cc09b6b6a0	Make array value parsing flag more robust. (#63354 ) When constructing a value fetcher, the 'parsesArrayValue' flag must match `FieldMapper#parsesArrayValue`. However there is nothing in code or tests to help enforce this. This PR reworks the value fetcher constructors so that `parsesArrayValue` is 'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must explicitly set it to true and ensure the behavior is covered by tests. Follow-up to #62974.	2020-10-06 14:42:03 -07:00
Mayya Sharipova	c45724079c	Fix fields retrieval on unsinged_long field (#63119 ) This fixes fields retrieval on unsigned_long field 1) For docvalue_fields a custom UnsignedLongLeafFieldData::getLeafValueFetcher is implemented that correctly retrieves doc values. 2) For stored fields, an error was fixed in UnsignedLongFieldMapper how stored values were stored. Before they were incorrectly stored in the shifted format, now they are stored as original values in String format. Relates to #60050	2020-10-06 05:44:50 -04:00
Luca Cavanna	ac93ca1819	Remove MapperService argument from IndexFieldData.Builder#build (#63197 ) MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way. There was another usage that recently went away with the removal of `TypeFieldMapper`.	2020-10-05 11:45:31 +02:00
Alan Woodward	ce649d07d7	Move FieldMapper#valueFetcher to MappedFieldType (#62974 ) For runtime fields, we will want to do all search-time interaction with a field definition via a MappedFieldType, rather than a FieldMapper, to avoid interfering with the logic of document parsing. Currently, fetching values for runtime scripts and for building top hits responses need to call a method on FieldMapper. This commit moves this method to MappedFieldType, incidentally simplifying the current call sites and freeing us up to implement runtime fields as pure MappedFieldType objects.	2020-10-04 10:47:04 +01:00
Mayya Sharipova	40d528995e	Small refactoring of unsinged_long (#62904 ) - Modify Lucene::readSortValue to read BigInteger as a sortValue. In #60050 writeSortValue was modified, but readSortValue was forgotten. - Adjust yml tests to v7.10 asr unsigned_long has been backported - Change MapperTests to use proper document IDs Relates to #60050	2020-09-30 08:52:26 -04:00
Mayya Sharipova	0219ac39b6	Fix UnsignedLongTests test failure (#63056 ) Test testSortDifferentFormatsShouldFail was occasionally failing for 2 reasons: 1) Documents on "idx2" were not available for search before a search request started 2) Running a test multiple times was causing occasional ResourceAlreadyExistsException for idx2, as idx2 was not deleted for a test. This patch makes the following fixes: 1) Sets up immediate refresh policy for docs in the index"idx2" 2) Creates an index idx2 only once per cluster Closes: #62997	2020-09-30 08:26:46 -04:00
Alan Woodward	118fa77a31	Add parameter update and conflict tests to MapperTestCase (#62828 ) This commit adds a mechanism to MapperTestCase that allows implementing test classes to check that their parameters can be updated, or throw conflict errors as advertised. Child classes override the registerParameters method and tell the passed-in UpdateChecker class about their parameters. Simple conflicts can be checked, using the existing minimal mappings as a base to compare against, or alternatively a particular initial mapping can be provided to check edge cases (eg, norms can be updated from true to false, but not vice versa). Updates are registered with a predicate that checks that the update has in fact been applied to the resulting FieldMapper. Fixes #61631	2020-09-24 19:39:44 +01:00
Mayya Sharipova	ff55296f7a	Introduce 64-bit unsigned long field type (#60050 ) This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Closes #32434	2020-09-23 12:06:21 -04:00

34 Commits