elasticsearch

Commit Graph

Author	SHA1	Message	Date
Przemyslaw Gomulka	ec7d9d22cd	[Rest Api Compatibility] Enable parent_join inner_hits test (#75560 ) The test in 7.x was fixed in #75534	2021-07-21 09:42:11 +02:00
Luca Cavanna	c6641bf00c	Rename ParseContext to DocumentParserContext (#74963 ) ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings. To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.	2021-07-06 09:15:59 -04:00
Przemyslaw Gomulka	5ac94b5263	[Rest Api Compatibility] Enable tests that are already fixed (#74174 ) With types removal changes being available under rest api compatibility I have removed the block entries for tests which are already fixed relates #51816	2021-06-29 09:11:31 +02:00
Ryan Ernst	63012c8a40	Move ParseField to o.e.c.xcontent (#73923 ) ParseField is part of the x-content lib, yet it doesn't exist under the same root package as the rest of the lib. This commit moves the class to the appropriate package. relates #73784	2021-06-08 13:32:14 -07:00
Ryan Ernst	68817d7ca2	Rename o.e.common in libs/core to o.e.core (#73909 ) When libs/core was created, several classes were moved from server's o.e.common package, but they were not moved to a new package. Split packages need to go away long term, so that Elasticsearch can even think about modularization. This commit moves all the classes under o.e.common in core to o.e.core. relates #73784	2021-06-08 09:53:28 -07:00
Julie Tibshirani	59000da936	Fix typo in ParentIdFieldMapper comment	2021-06-07 08:49:58 -07:00
Julie Tibshirani	58c1477095	Adjust REST test skip version for join field retrieval The version can be updated now that the test was backported.	2021-06-05 17:14:25 -07:00
Julie Tibshirani	dc86babfe6	Fix error when fetching values for parent ID join field (#73639 ) The parent ID join field is an internal field that links child documents to their parent. Although it's internal, we include it when listing all field types. This means a search with `"fields": "*"` can attempt to fetch values from the parent ID field and fail. This PR applies a simple fix to return an empty result instead of failing.	2021-06-04 11:26:45 -07:00
Luca Cavanna	05ca9cf876	Remove getMatchingFieldTypes method (#73655 ) FieldTypeLookup and MappingLookup expose the getMatchingFieldTypes method to look up matching field type by a string pattern. We have migrated ExistsQueryBuilder to instead rely on getMatchingFieldNames, hence we can go ahead and remove the remaining usages and the method itself. The remaining usages are to find specific field types from the mappings, specifically to eagerly load global ordinals and for the join field type. These are operations that are performed only once when loading the mappings, and may be refactored to work differently in the future. For now, we remove getMatchingFieldTypes and rather call for the two mentioned scenarios getMatchingFieldNames(*) and then getFieldType for each of the returned field name. This is a bit wasteful but performance can be sacrificed for these scenarios in favour of less code to maintain.	2021-06-03 10:01:22 +02:00
Alan Woodward	3bd594ebe8	Replace simpleMatchToFullName (#72674 ) MappingLookup has a method simpleMatchToFieldName that attempts to return all field names that match a given pattern; if no patterns match, then it returns a single-valued collection containing just the pattern that was originally passed in. This is a fairly confusing semantic. This PR replaces simpleMatchToFullName with two new methods: * getMatchingFieldNames(), which returns a set of all mapped field names that match a pattern. Calling getFieldType() with a name returned by this method is guaranteed to return a non-null MappedFieldType * getMatchingFieldTypes, that returns a collection of all MappedFieldTypes in a mapping that match the passed-in pattern. This allows us to clean up several call-sites because we know that MappedFieldTypes returned from these calls will never be null. It also simplifies object field exists query construction.	2021-05-13 11:35:23 +01:00
Alan Woodward	f2ac4f9953	Avoid using external values in parent-join and percolator mappers (#71834 ) We would like to remove the use of 'external values' in document parsing. This commit simplifies two of the four places it is currently used, by adding direct indexValue methods to BinaryFieldMapper and ParentIdFieldMapper. Relates to #56063	2021-04-20 12:18:42 +01:00
Jake Landis	279fde375e	Apply REST API compatibility testing for the :modules (#71137 )	2021-04-02 11:20:54 -05:00
Mark Vieira	6339691fe3	Consolidate REST API specifications and publish under Apache 2.0 license (#70036 )	2021-03-26 16:20:14 -07:00
Alan Woodward	19da36ab86	Remove MappedFieldType#setEagerGlobalOrdinals (#70920 ) This is the only remaining setter on MappedFieldType, and removing it makes the base class entirely final. We now only override the eagerGlobalOrdinals method on types that actually support it.	2021-03-26 17:03:29 +00:00
Luca Cavanna	edb42690bc	Split RuntimeFieldType from corresponding MappedFieldType (#70695 ) So far the runtime section supports only leaf field types, hence the internal representation is based on `RuntimeFieldType` that extends directly `MappedFieldType`. This is straightforward but it is limiting for e.g. an alias field that points to another field, or for object fields that are not queryable directly, hence should not be a MappedFieldType, yet their subfields do. This commit makes `RuntimeFieldType` an interface, effectively splitting the definition of a runtime fields as defined and returned in the mappings, from its internal representation in terms of `MappedFieldType`. The existing runtime script field types still extend `MappedFieldType` and now also implement the new interface, which makes the change rather simple.	2021-03-23 10:57:44 +01:00
Jim Ferenczi	ff50da5a77	Remove the _parent_join metadata field (#70143 ) This commit removes the metadata field _parent_join that was needed to ensure that only one join field is used in a mapping. It is replaced with a validation at the field level. This change also fixes in [bug](https://github.com/elastic/kibana/issues/92960) in the handling of parent join fields in _field_caps. This metadata field throws an unexpected exception in [7.11](https://github.com/elastic/elasticsearch/pull/63878) when checking if the field is aggregatable. That's now fixed since this unused field has been removed.	2021-03-10 09:19:30 +01:00
Alan Woodward	139ff8657a	Require `meta` field for MappedFieldType to be non-null (#70145 ) The transport action for FieldCapabilities assumes the meta field for a MappedFieldType is traversable. This commit adds a requirement to MappedFieldType itself to ensure that it is implemented for all subtypes.	2021-03-09 15:40:03 +00:00
Nik Everett	10e2f90560	Speed up aggs with sub-aggregations (#69806 ) This allows many of the optimizations added in #63643 and #68871 to run on aggregations with sub-aggregations. This should: * Speed up `terms` aggregations on fields with less than 1000 values that also have sub-aggregations. Locally I see 2 second searches run in 1.2 seconds. * Applies that same speedup to `range` and `date_histogram` aggregations but it feels less impressive because the point range queries are a little slower to get up and go. * Massively speed up `filters` aggregations with sub-aggregations that don't have a `parent` aggregation or collect "other" buckets. Also save a ton of memory while collecting them.	2021-03-03 18:04:47 -05:00
Igor Motov	0bbc6addd9	Revert "Remove aggregation's postCollect phase (#68615 ) This partially reverts #64016 and and adds #67839 and adds additional tests that would have caught issues with the changes in #64016. It's mostly Nik's code, I am just cleaning things up a bit. Co-authored-by: Nik Everett <nik9000@gmail.com>	2021-02-10 19:12:50 -05:00
Rory Hunter	2d44cce31e	Replace NOT operator with explicit `false` check - part 9 (#68645 ) Part 9. We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-02-08 15:28:57 +00:00
Mark Vieira	a92a647b9f	Update sources with new SSPL+Elastic-2.0 license headers As per the new licensing change for Elasticsearch and Kibana this commit moves existing Apache 2.0 licensed source code to the new dual license SSPL+Elastic license 2.0. In addition, existing x-pack code now uses the new version 2.0 of the Elastic license. Full changes include: - Updating LICENSE and NOTICE files throughout the code base, as well as those packaged in our published artifacts - Update IDE integration to now use the new license header on newly created source files - Remove references to the "OSS" distribution from our documentation - Update build time verification checks to no longer allow Apache 2.0 license header in Elasticsearch source code - Replace all existing Apache 2.0 license headers for non-xpack code with updated header (vendored code with Apache 2.0 headers obviously remains the same). - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.	2021-02-02 16:10:53 -08:00
Rory Hunter	ad1f876daa	Replace NOT operator with explicit `false` check (#67817 ) We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-01-26 14:47:09 +00:00
Julie Tibshirani	5852fbedf5	Rename QueryShardContext -> SearchExecutionContext. (#67490 ) We decided to rename `QueryShardContext` to clarify that it supports all parts of search request execution. Before there was confusion over whether it should only be used for building queries, or maybe only used in the query phase. This PR also updates the javadocs. Closes #64740.	2021-01-14 09:11:59 -08:00
Mark Tozzi	e26c9bbd52	Rename BYTES ValuesSourceType to reflect intended usage (#66762 )	2020-12-30 12:39:17 -05:00
Julie Tibshirani	d0683141f4	Ensure all query builder tests consider older versions. (#66401 ) This PR removes outdated overrides in some tests that prevent them from testing older index versions. Also removes an old comment + logic from AggregatorFactoriesTests.	2020-12-16 09:19:26 -08:00
Nik Everett	7b3c6f2a0c	Further clean up in AggregatorTestCase (#66395 ) Drops `AggregatorTestCase#mapperServiceMock` because it is getting in the way of other work I'm doing for runtime fields. It was only overridden to test the `parent` and `child` aggregation to add the `MappedFieldType`s for join fields in the backdoor. Those aggregations can just as easily add those fields in the normal method calls.	2020-12-16 11:56:04 -05:00
Armin Braun	06a31a0aca	Add List Append Utility Method (#65576 ) (list -> copy -> add one -> wrap immutable) is a pretty common pattern in CS updates and tests => added a shortcut for it here and used it in easily identifyable spots.	2020-12-01 02:47:21 +01:00
Nik Everett	c227554080	Remove SearchContext from constructing aggregations (#64953 ) This replaces the `SearchContext` passed to the ctor of `Aggregation`s with `AggregationContext`. It ends up adding a fairly large number of methods to `AggregationContext` but in exchange it shows a path to removing a few methods from `SearchContext`. That seems nice! It also gives us an accurate inventory of "all of the stuff" that aggregations use to build and run.	2020-11-30 13:19:44 -05:00
Julie Tibshirani	f4a462d05e	Simplify how source is passed to fetch subphases. (#65292 ) This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy: * For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`. * Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting. There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used. Addresses #62511.	2020-11-20 14:09:41 -08:00
Alan Woodward	0fd70ae383	Remove Mapper.BuilderContext (#64625 ) Mapper.BuilderContext is a simple wrapper around two objects, some IndexSettings and a ContentPath. The IndexSettings are the same as those provided in the ParserContext, so we can simplify things here by removing them and just passing ContentPath directly to Mapper.Builder#build()	2020-11-05 10:48:39 +00:00
Luca Cavanna	f1e9aec8dc	Replace more MapperService usages in favour of QueryShardContext (#64584 ) This commit replaces most of the leftover direct access to MapperService from SearchContext and FetchContext with accessing QueryShardContext instead, which wraps the MapperService and exposes a subset of its functionality needed when executing the different phases of search	2020-11-04 15:49:38 +01:00
Alan Woodward	f010269ab7	Move index analyzer management to FieldMapper/MapperService (#63937 ) Index-time analyzers are currently specified on the MappedFieldType. This has a number of unfortunate consequences; for example, field mappers that index data into implementation sub-fields, such as prefix or phrase accelerators on text fields, need to expose these sub-fields as MappedFieldTypes, which means that they then appear in field caps, are externally searchable, etc. It also adds index-time logic to a class that should only be concerned with search-time behaviour. This commit removes references to the index analyzer from MappedFieldType. Instead, FieldMappers that use the terms index can pass either a single analyzer or a Map of fields to analyzers to their super constructor, which are then exposed via a new FieldMapper#indexAnalyzers() method; all index-time analysis is mediated through the delegating analyzer wrapper on MapperService. In a follow-up, this will make it possible to register multiple field analyzers from a single FieldMapper, removing the need for 'hidden' mapper implementations on text field, parent joins, and elsewhere.	2020-11-04 13:53:09 +00:00
Luca Cavanna	344ad33a16	Remove ValueFetcher depedendency from MapperService (#64524 ) The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService. Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.	2020-11-04 12:08:34 +01:00
Alan Woodward	a5168572d5	Collapse ParametrizedFieldMapper into FieldMapper (#64365 ) Now that all our FieldMapper implementations extend ParametrizedFieldMapper, we can collapse the two classes together, and remove a load of cruft from FieldMapper that is unused. In particular: * we no longer need the lucene FieldType field on FieldMapper * we no longer use clone() for merging, so we can remove it from all impls * the serialization code in FieldMapper that assumes we're looking at text fields can go	2020-11-02 15:07:52 +00:00
Nik Everett	3af540b50d	Remove aggregation's postCollect phase (#64016 ) After #63811 it became clear to me that `postCollect` is kind of dangerous and not all that useful. So this removes it. The trouble with `postCollect` is that it all happened right after we finished calling `collect` on the `LeafBucketCollectors` but before we built the aggregation results. But in #63811 we found out that we can't call `postCollect` on the children of `parent` or `child` aggregators until we know which which aggregation results we're building. So this removes `postCollect` and moves all of the things we did at post-collect phase into `buildAggregations` or into hooks called in those methods.	2020-10-28 17:33:27 -04:00
Nik Everett	d2043a4b12	Add more tests for parent/child aggs I broke the `parent` and `child` agg something fierce in #57892 and fixed it in #63811. This adds more tests for that fix mimicking other reported failures.	2020-10-28 16:06:02 -04:00
Luca Cavanna	2186b75af9	Reduce usages of SearchContext#mapperService (#64250 ) We recently removed getMapperService from QueryShardContext in the attempt to avoid consumers depending on the whole MapperService. SearchContext still has that problem although it is easier to solved as it can delegate to QueryShardContext for the most part, which is what this commit does for most of the existing usages.	2020-10-28 09:55:52 +01:00
Nik Everett	7feb19a74f	Make sure non-collecting aggs include sub-aggs (#64214 ) Now that we're consistently using `cat_match` to filter which shards we run on we can get this confusing case: 1. You have a search with, say, a range and a sub-agg. 2. That search has a query that `can_match` can recognize will match no docs. On any shard. 3. So we dutifully run it on a single shard so it can produce the "empty" aggs. 4. The shard we pick happens to not have the target of the range mapped. 5. This kicks in the special range aggregator that doesn't collect any documents. 6. Before this commit, that range aggregator also never produced any sub-aggs. So, without this change, it was quite possible for a search that happened to match no documents to "throw away" the sub-aggs of a range and a few other aggs. We've had this problem for a long, long time but it is more confusing now because `can_match` is really kicking in and causing us to see cases where it looks like you are targeting a lot of shards but you really are only targeting a couple. It used to be that to get the "no sub-aggs" behavior you had to explicitly target only shards that didn't map the target field of the `range` agg. And, like, in that case it isn't too bad because you targeted a sort of degenerate shard. But now that `can_match` is doing its thing you can end up with the confusing steps above. It took me several hours to track down what what happening I know how the individual pieces of all of this works. It took four hours to figure out how they fit together in this case.... Anyway! This replaces all the aggregator implementations that throw out the sub-aggregators with ones that keep them. I think this'll be less confusing in the future. Closes #64142	2020-10-27 15:45:24 -04:00
Nik Everett	6ef0e5f5e8	Limit blast redius of SearchContext in aggs (#64068 ) This takes away access to the `SearchContext` from all subclasses of `Aggregator`. Now they have access to three things: * BigArrays * The top level Query * The IndexSearcher These are used by a whole bunch of aggs. This is a useful change because `SearchContext` is very large and difficult to mock in tests and difficult to reason about in general. Limiting what aggs can use when they are being collected helps with this. We still pass `SearchContext` to `AggregatorBase`'s ctor so the thing is still around. But we can remove that access in a follow up.	2020-10-27 09:12:58 -04:00
Nik Everett	769e30dd88	Fix broken parent and child aggregator (#63811 ) In #57892 I broke some sub-aggregations inside of the `parent` and `child` aggregator, specifically any sub-aggregations that do work in the `postCollect` phase. This fixes it by delaying the post collect phase of aggs under `parent` and `child` until `beforeBuildingBuckets` because, well, we haven't done any collection until after that phase.	2020-10-19 10:54:09 -04:00
Alan Woodward	b79e6ae8f7	Convert parent-join mappers to parametrized form (#63878 ) This converts the three parent-join mapper implementations to parametrized form; MetaJoinFieldMapper and ParentIdFieldMapper have no builders or merging logic as they are always created directly by the ParentJoinFieldMapper. Relates to #62988	2020-10-19 15:37:47 +01:00
Alan Woodward	70d88ef62d	Rework parent-join to not require access to DocumentMapper (#63738 ) Parent joins work using a cluster of field mappers: the join field itself; a set of subfields that allow multiple relationships between parents and children to be defined; and a metadata field that acts to only allow a single join field per index to be defined. The various queries and aggregations that use this infrastructure retrieve the join field mapper via a static method and then build themselves by pulling individual relationship mappers from this main mapper. Using mappers rather than MappedFieldTypes means that we need to expose DocumentMapper at search time, which is something we are trying to avoid. This commit refactors things so that the join relations are encapsulated in a Joiner object, which lives instead on the MappedFieldType associated with the metadata join field. Rather than using the ParentJoinFieldMapper and connected ParentIdFieldMappers, we can now build queries and aggregations using this Joiner object, retrieved via the QueryShardContext or AggregationContext using a static helper method on Joiner itself.	2020-10-19 12:17:48 +01:00
Luca Cavanna	d126afb2c2	Remove direct dependency between ParserContext and MapperService (#63741 ) ParserContext only needs some small portions of MapperService, and certainly does not need to expose MapperService through its current getter method. With this change we address this by keeping references to the needed components rather than the whole MapperService	2020-10-15 17:45:53 +02:00
Alan Woodward	8b98af24b4	Remove generics from Mapper.Builder (#63623 ) We simplified the generics on Mapper.Builder in #56747, but stopped short of removing them entirely because they were still used in various places in the code. Now that most field mappers have been converted to parametrized form, these generics are no longer useful. There are very few places where a fluent Builder pattern is used, almost all in tests, and these can all be replaced with simple casts; in exchange, we remove lots of visual cruft and clean up a number of warnings.	2020-10-13 17:24:10 +01:00
Nik Everett	4aaffc6a3d	Consider query when optimizing date rounding (#63403 ) Before this change we inspected the index when optimizing `date_histogram` aggregations, precalculating the divisions for the buckets for the entire range of dates on the index so long as there aren't a ton of these buckets. This works very well when you query all of the dates in the index which is quite common - after all, folks frequently want to query a week of data and have daily indices. But it doesn't work as well when the index is much larger than the query. This is quite common when dumping data into ES just to investigate it but less common in the traditional time series use case. But even there it still happens, it is just less impactful. Consider the default query produced by Kibana's Discover app: a range of 15 minutes and a interval of 30 seconds. This optimization saves something like 3 to 12 nanoseconds per document, so that 15 minutes would have to have hundreds of millions of documents for it to be impactful. Anyway, this commit takes the query into account when precalculating the buckets. Mostly this is good when you have "dirty data". Immagine loading 80 billion docs in an index to investigate them. Most of them have dates around 2015 and 2016 but some have dates in 1970 and others have dates in 2030. These outlier dates are "dirty" "garbage". Well, without this change a `date_histogram` across many of these docs is significantly slowed down because we don't precalculate the range due to the outliers. That's just rude! So this change takes the query into account. The bulk of the code change here is plumbing the query into place. It turns out that its a ton of plumbing, so instead of just adding a `Query` member in hundreds of args replace `QueryShardContext` with a new `AggregationContext` which does two things: 1. Has the top level `Query`. 2. Exposes just the parts of `QueryShardContext` that we actually need to run aggregation. This lets us simplify a few tests now and will let us simplify many, many tests later.	2020-10-12 13:11:44 -04:00
Julie Tibshirani	8c56bbc3e6	Add factory methods for common value fetchers. (#63438 ) This PR adds factory methods for the most common implementations: * `SourceValueFetcher.identity` to pass through the source value untouched. * `SourceValueFetcher.toString` to simply convert the source value to a string.	2020-10-08 11:58:36 -07:00
Julie Tibshirani	cc09b6b6a0	Make array value parsing flag more robust. (#63354 ) When constructing a value fetcher, the 'parsesArrayValue' flag must match `FieldMapper#parsesArrayValue`. However there is nothing in code or tests to help enforce this. This PR reworks the value fetcher constructors so that `parsesArrayValue` is 'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must explicitly set it to true and ensure the behavior is covered by tests. Follow-up to #62974.	2020-10-06 14:42:03 -07:00
Alan Woodward	ce649d07d7	Move FieldMapper#valueFetcher to MappedFieldType (#62974 ) For runtime fields, we will want to do all search-time interaction with a field definition via a MappedFieldType, rather than a FieldMapper, to avoid interfering with the logic of document parsing. Currently, fetching values for runtime scripts and for building top hits responses need to call a method on FieldMapper. This commit moves this method to MappedFieldType, incidentally simplifying the current call sites and freeing us up to implement runtime fields as pure MappedFieldType objects.	2020-10-04 10:47:04 +01:00
Luca Cavanna	daade44174	Share same existsQuery impl throughout mappers (#57607 ) Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers. There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available. This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method. At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.	2020-09-23 08:58:09 +02:00
Luca Cavanna	3a9b65733c	Move stored flag from TextSearchInfo to MappedFieldType (#62717 )	2020-09-22 15:41:24 +02:00

1 2 3 4 5

201 Commits