Commit Graph

201 Commits

Author SHA1 Message Date
Przemyslaw Gomulka ec7d9d22cd
[Rest Api Compatibility] Enable parent_join inner_hits test (#75560)
The test in 7.x was fixed in #75534
2021-07-21 09:42:11 +02:00
Luca Cavanna c6641bf00c
Rename ParseContext to DocumentParserContext (#74963)
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.

To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
2021-07-06 09:15:59 -04:00
Przemyslaw Gomulka 5ac94b5263
[Rest Api Compatibility] Enable tests that are already fixed (#74174)
With types removal changes being available under rest api compatibility I have removed the block entries for tests which are already fixed
relates #51816
2021-06-29 09:11:31 +02:00
Ryan Ernst 63012c8a40
Move ParseField to o.e.c.xcontent (#73923)
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.

relates #73784
2021-06-08 13:32:14 -07:00
Ryan Ernst 68817d7ca2
Rename o.e.common in libs/core to o.e.core (#73909)
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.

relates #73784
2021-06-08 09:53:28 -07:00
Julie Tibshirani 59000da936 Fix typo in ParentIdFieldMapper comment 2021-06-07 08:49:58 -07:00
Julie Tibshirani 58c1477095 Adjust REST test skip version for join field retrieval
The version can be updated now that the test was backported.
2021-06-05 17:14:25 -07:00
Julie Tibshirani dc86babfe6
Fix error when fetching values for parent ID join field (#73639)
The parent ID join field is an internal field that links child documents to
their parent. Although it's internal, we include it when listing all field
types. This means a search with `"fields": "*"` can attempt to fetch values from
the parent ID field and fail.

This PR applies a simple fix to return an empty result instead of failing.
2021-06-04 11:26:45 -07:00
Luca Cavanna 05ca9cf876
Remove getMatchingFieldTypes method (#73655)
FieldTypeLookup and MappingLookup expose the getMatchingFieldTypes method to look up matching field type by a string pattern. We have migrated ExistsQueryBuilder to instead rely on getMatchingFieldNames, hence we can go ahead and remove the remaining usages and the method itself.

The remaining usages are to find specific field types from the mappings, specifically to eagerly load global ordinals and for the join field type. These are operations that are performed only once when loading the mappings, and may be refactored to work differently in the future. For now, we remove getMatchingFieldTypes and rather call for the two mentioned scenarios getMatchingFieldNames(*) and then getFieldType for each of the returned field name. This is a bit wasteful but performance can be sacrificed for these scenarios in favour of less code to maintain.
2021-06-03 10:01:22 +02:00
Alan Woodward 3bd594ebe8
Replace simpleMatchToFullName (#72674)
MappingLookup has a method simpleMatchToFieldName that attempts
to return all field names that match a given pattern; if no patterns match,
then it returns a single-valued collection containing just the pattern that
was originally passed in. This is a fairly confusing semantic.

This PR replaces simpleMatchToFullName with two new methods:

* getMatchingFieldNames(), which returns a set of all mapped field names
  that match a pattern. Calling getFieldType() with a name returned by
  this method is guaranteed to return a non-null MappedFieldType
* getMatchingFieldTypes, that returns a collection of all MappedFieldTypes
  in a mapping that match the passed-in pattern.

This allows us to clean up several call-sites because we know that
MappedFieldTypes returned from these calls will never be null. It also
simplifies object field exists query construction.
2021-05-13 11:35:23 +01:00
Alan Woodward f2ac4f9953
Avoid using external values in parent-join and percolator mappers (#71834)
We would like to remove the use of 'external values' in document parsing.
This commit simplifies two of the four places it is currently used, by adding
direct indexValue methods to BinaryFieldMapper and ParentIdFieldMapper.

Relates to #56063
2021-04-20 12:18:42 +01:00
Jake Landis 279fde375e
Apply REST API compatibility testing for the :modules (#71137) 2021-04-02 11:20:54 -05:00
Mark Vieira 6339691fe3
Consolidate REST API specifications and publish under Apache 2.0 license (#70036) 2021-03-26 16:20:14 -07:00
Alan Woodward 19da36ab86
Remove MappedFieldType#setEagerGlobalOrdinals (#70920)
This is the only remaining setter on MappedFieldType, and removing
it makes the base class entirely final. We now only override the
eagerGlobalOrdinals method on types that actually support it.
2021-03-26 17:03:29 +00:00
Luca Cavanna edb42690bc
Split RuntimeFieldType from corresponding MappedFieldType (#70695)
So far the runtime section supports only leaf field types, hence the internal representation is based on `RuntimeFieldType` that extends directly `MappedFieldType`. This is straightforward but it is limiting for e.g. an alias field that points to another field, or for object fields that are not queryable directly, hence should not be a MappedFieldType, yet their subfields do.

This commit makes `RuntimeFieldType` an interface, effectively splitting the definition of a runtime fields as defined and returned in the mappings, from its internal representation in terms of `MappedFieldType`.

The existing runtime script field types still extend `MappedFieldType` and now also implement the new interface, which makes the change rather simple.
2021-03-23 10:57:44 +01:00
Jim Ferenczi ff50da5a77
Remove the _parent_join metadata field (#70143)
This commit removes the metadata field _parent_join
that was needed to ensure that only one join field is used in a mapping.
It is replaced with a validation at the field level.
This change also fixes in [bug](https://github.com/elastic/kibana/issues/92960) in the handling of parent join fields in _field_caps.
This metadata field throws an unexpected exception in [7.11](https://github.com/elastic/elasticsearch/pull/63878)
when checking if the field is aggregatable.
That's now fixed since this unused field has been removed.
2021-03-10 09:19:30 +01:00
Alan Woodward 139ff8657a
Require `meta` field for MappedFieldType to be non-null (#70145)
The transport action for FieldCapabilities assumes the meta field for a MappedFieldType
is traversable. This commit adds a requirement to MappedFieldType itself to ensure that
it is implemented for all subtypes.
2021-03-09 15:40:03 +00:00
Nik Everett 10e2f90560
Speed up aggs with sub-aggregations (#69806)
This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
  also have sub-aggregations. Locally I see 2 second searches run in 1.2
  seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
  it feels less impressive because the point range queries are a little
  slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
  don't have a `parent` aggregation or collect "other" buckets. Also
  save a ton of memory while collecting them.
2021-03-03 18:04:47 -05:00
Igor Motov 0bbc6addd9
Revert "Remove aggregation's postCollect phase (#68615)
This partially reverts #64016 and  and adds #67839 and adds
additional tests that would have caught issues with the changes
in #64016. It's mostly Nik's code, I am just cleaning things up
a bit.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2021-02-10 19:12:50 -05:00
Rory Hunter 2d44cce31e
Replace NOT operator with explicit `false` check - part 9 (#68645)
Part 9.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:28:57 +00:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Mark Tozzi e26c9bbd52
Rename BYTES ValuesSourceType to reflect intended usage (#66762) 2020-12-30 12:39:17 -05:00
Julie Tibshirani d0683141f4
Ensure all query builder tests consider older versions. (#66401)
This PR removes outdated overrides in some tests that prevent them from testing
older index versions. Also removes an old comment + logic from
AggregatorFactoriesTests.
2020-12-16 09:19:26 -08:00
Nik Everett 7b3c6f2a0c
Further clean up in AggregatorTestCase (#66395)
Drops `AggregatorTestCase#mapperServiceMock` because it is getting in
the way of other work I'm doing for runtime fields. It was only
overridden to test the `parent` and `child` aggregation to add the
`MappedFieldType`s for join fields in the backdoor. Those aggregations
can just as easily add those fields in the normal method calls.
2020-12-16 11:56:04 -05:00
Armin Braun 06a31a0aca
Add List Append Utility Method (#65576)
(list -> copy -> add one -> wrap immutable) is a pretty common pattern in CS
updates and tests => added a shortcut for it here and used it in easily identifyable
spots.
2020-12-01 02:47:21 +01:00
Nik Everett c227554080
Remove SearchContext from constructing aggregations (#64953)
This replaces the `SearchContext` passed to the ctor of `Aggregation`s
with `AggregationContext`. It ends up adding a fairly large number of
methods to `AggregationContext` but in exchange it shows a path to
removing a few methods from `SearchContext`. That seems nice!

It also gives us an accurate inventory of "all of the stuff" that
aggregations use to build and run.
2020-11-30 13:19:44 -05:00
Julie Tibshirani f4a462d05e
Simplify how source is passed to fetch subphases. (#65292)
This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:
* For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used.

Addresses #62511.
2020-11-20 14:09:41 -08:00
Alan Woodward 0fd70ae383
Remove Mapper.BuilderContext (#64625)
Mapper.BuilderContext is a simple wrapper around two objects, some
IndexSettings and a ContentPath. The IndexSettings are the same as
those provided in the ParserContext, so we can simplify things here
by removing them and just passing ContentPath directly to
Mapper.Builder#build()
2020-11-05 10:48:39 +00:00
Luca Cavanna f1e9aec8dc
Replace more MapperService usages in favour of QueryShardContext (#64584)
This commit replaces most of the leftover direct access to MapperService from SearchContext and FetchContext with accessing QueryShardContext instead, which wraps the MapperService and exposes a subset of its functionality needed when executing the different phases of search
2020-11-04 15:49:38 +01:00
Alan Woodward f010269ab7
Move index analyzer management to FieldMapper/MapperService (#63937)
Index-time analyzers are currently specified on the MappedFieldType. This
has a number of unfortunate consequences; for example, field mappers that
index data into implementation sub-fields, such as prefix or phrase
accelerators on text fields, need to expose these sub-fields as MappedFieldTypes,
which means that they then appear in field caps, are externally searchable,
etc. It also adds index-time logic to a class that should only be concerned
with search-time behaviour.

This commit removes references to the index analyzer from MappedFieldType.
Instead, FieldMappers that use the terms index can pass either a single analyzer
or a Map of fields to analyzers to their super constructor, which are then
exposed via a new FieldMapper#indexAnalyzers() method; all index-time analysis 
is mediated through the delegating analyzer wrapper on MapperService. 
In a follow-up, this will make it possible to register multiple field analyzers from 
a single FieldMapper, removing the need for 'hidden' mapper implementations 
on text field, parent joins, and elsewhere.
2020-11-04 13:53:09 +00:00
Luca Cavanna 344ad33a16
Remove ValueFetcher depedendency from MapperService (#64524)
The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService.

Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.
2020-11-04 12:08:34 +01:00
Alan Woodward a5168572d5
Collapse ParametrizedFieldMapper into FieldMapper (#64365)
Now that all our FieldMapper implementations extend ParametrizedFieldMapper,
we can collapse the two classes together, and remove a load of cruft from
FieldMapper that is unused. In particular:

* we no longer need the lucene FieldType field on FieldMapper
* we no longer use clone() for merging, so we can remove it from all impls
* the serialization code in FieldMapper that assumes we're looking at text fields can go
2020-11-02 15:07:52 +00:00
Nik Everett 3af540b50d
Remove aggregation's postCollect phase (#64016)
After #63811 it became clear to me that `postCollect` is kind of
dangerous and not all that useful. So this removes it.

The trouble with `postCollect` is that it all happened right after we
finished calling `collect` on the `LeafBucketCollectors` but before we
built the aggregation results. But in #63811 we found out that we can't
call `postCollect` on the children of `parent` or `child` aggregators
until we know which *which* aggregation results we're building.

So this removes `postCollect` and moves all of the things we did at
post-collect phase into `buildAggregations` or into hooks called in
those methods.
2020-10-28 17:33:27 -04:00
Nik Everett d2043a4b12 Add more tests for parent/child aggs
I broke the `parent` and `child` agg something fierce in #57892 and
fixed it in #63811. This adds more tests for that fix mimicking other
reported failures.
2020-10-28 16:06:02 -04:00
Luca Cavanna 2186b75af9
Reduce usages of SearchContext#mapperService (#64250)
We recently removed getMapperService from QueryShardContext in the attempt to avoid consumers depending on the whole MapperService. SearchContext still has that problem although it is easier to solved as it can delegate to QueryShardContext for the most part, which is what this commit does for most of the existing usages.
2020-10-28 09:55:52 +01:00
Nik Everett 7feb19a74f
Make sure non-collecting aggs include sub-aggs (#64214)
Now that we're consistently using `cat_match` to filter which shards we
run on we can get this confusing case:
1. You have a search with, say, a range and a sub-agg.
2. That search has a query that `can_match` can recognize will match no
   docs. On *any* shard.
3. So we dutifully run it on a single shard so it can produce the
   "empty" aggs.
4. The shard we pick happens to not have the target of the range mapped.
5. This kicks in the special range aggregator that doesn't collect any
   documents.
6. Before this commit, that range aggregator *also* never produced any
   sub-aggs.

So, without this change, it was quite possible for a search that
happened to match no documents to "throw away" the sub-aggs of a range
and a few other aggs.

We've had this problem for a long, long time but it is more confusing
now because `can_match` is really kicking in and causing us to see cases
where it looks like you are targeting a lot of shards but you really are
only targeting a couple. It used to be that to get the "no sub-aggs"
behavior you had to explicitly target only shards that didn't map the
target field of the `range` agg. And, like, in that case it isn't too
bad because you targeted a sort of degenerate shard. But now that
`can_match` is doing its thing you can end up with the confusing steps
above. It took me several hours to track down what what happening I know
how the individual pieces of all of this works. It took four hours to
figure out how they fit together in this case....

Anyway! This replaces all the aggregator implementations that throw out
the sub-aggregators with ones that keep them. I think this'll be less
confusing in the future.

Closes #64142
2020-10-27 15:45:24 -04:00
Nik Everett 6ef0e5f5e8
Limit blast redius of SearchContext in aggs (#64068)
This takes away access to the `SearchContext` from all subclasses of
`Aggregator`. Now they have access to three things:
* BigArrays
* The top level Query
* The IndexSearcher

These are used by a whole bunch of aggs.

This is a useful change because `SearchContext` is very large and
difficult to mock in tests and difficult to reason about in general.
Limiting what aggs can use when they are being collected helps with
this.

We still pass `SearchContext` to `AggregatorBase`'s ctor so the thing is
still around. But we can remove that access in a follow up.
2020-10-27 09:12:58 -04:00
Nik Everett 769e30dd88
Fix broken parent and child aggregator (#63811)
In #57892 I broke *some* sub-aggregations inside of the `parent` and
`child` aggregator, specifically any sub-aggregations that do work in
the `postCollect` phase. This fixes it by delaying the post collect
phase of aggs under `parent` and `child` until `beforeBuildingBuckets`
because, well, we haven't done *any* collection until after that phase.
2020-10-19 10:54:09 -04:00
Alan Woodward b79e6ae8f7
Convert parent-join mappers to parametrized form (#63878)
This converts the three parent-join mapper implementations to parametrized
form; MetaJoinFieldMapper and ParentIdFieldMapper have no builders or
merging logic as they are always created directly by the ParentJoinFieldMapper.

Relates to #62988
2020-10-19 15:37:47 +01:00
Alan Woodward 70d88ef62d
Rework parent-join to not require access to DocumentMapper (#63738)
Parent joins work using a cluster of field mappers: the join field itself;
a set of subfields that allow multiple relationships between parents and
children to be defined; and a metadata field that acts to only allow a
single join field per index to be defined. The various queries and
aggregations that use this infrastructure retrieve the join field mapper
via a static method and then build themselves by pulling individual
relationship mappers from this main mapper.

Using mappers rather than MappedFieldTypes means that we need to
expose DocumentMapper at search time, which is something we are
trying to avoid. This commit refactors things so that the join relations
are encapsulated in a Joiner object, which lives instead on the
MappedFieldType associated with the metadata join field. Rather than
using the ParentJoinFieldMapper and connected ParentIdFieldMappers,
we can now build queries and aggregations using this Joiner object,
retrieved via the QueryShardContext or AggregationContext using
a static helper method on Joiner itself.
2020-10-19 12:17:48 +01:00
Luca Cavanna d126afb2c2
Remove direct dependency between ParserContext and MapperService (#63741)
ParserContext only needs some small portions of MapperService, and certainly does not need to expose MapperService through its current getter method.

With this change we address this by keeping references to the needed components rather than the whole MapperService
2020-10-15 17:45:53 +02:00
Alan Woodward 8b98af24b4
Remove generics from Mapper.Builder (#63623)
We simplified the generics on Mapper.Builder in #56747, but stopped short
of removing them entirely because they were still used in various places in
the code. Now that most field mappers have been converted to parametrized
form, these generics are no longer useful. There are very few places where
a fluent Builder pattern is used, almost all in tests, and these can all be
replaced with simple casts; in exchange, we remove lots of visual cruft and
clean up a number of warnings.
2020-10-13 17:24:10 +01:00
Nik Everett 4aaffc6a3d
Consider query when optimizing date rounding (#63403)
Before this change we inspected the index when optimizing
`date_histogram` aggregations, precalculating the divisions for the
buckets for the entire range of dates on the index so long as there
aren't a ton of these buckets. This works very well when you query all
of the dates in the index which is quite common - after all, folks
frequently want to query a week of data and have daily indices.

But it doesn't work as well when the index is much larger than the
query. This is quite common when dumping data into ES just to
investigate it but less common in the traditional time series use case.
But even there it still happens, it is just less impactful. Consider
the default query produced by Kibana's Discover app: a range of 15
minutes and a interval of 30 seconds. This optimization saves something
like 3 to 12 nanoseconds per document, so that 15 minutes would have to
have hundreds of millions of documents for it to be impactful.

Anyway, this commit takes the query into account when precalculating the
buckets. Mostly this is good when you have "dirty data". Immagine
loading 80 billion docs in an index to investigate them. Most of them
have dates around 2015 and 2016 but some have dates in 1970 and
others have dates in 2030. These outlier dates are "dirty" "garbage".
Well, without this change a `date_histogram` across many of these docs
is significantly slowed down because we don't precalculate the range due
to the outliers. That's just rude! So this change takes the query into
account.

The bulk of the code change here is plumbing the query into place. It
turns out that its a *ton* of plumbing, so instead of just adding a
`Query` member in hundreds of args replace `QueryShardContext` with a
new `AggregationContext` which does two things:
1. Has the top level `Query`.
2. Exposes just the parts of `QueryShardContext` that we actually need
   to run aggregation. This lets us simplify a few tests now and will
   let us simplify many, many tests later.
2020-10-12 13:11:44 -04:00
Julie Tibshirani 8c56bbc3e6
Add factory methods for common value fetchers. (#63438)
This PR adds factory methods for the most common implementations:
* `SourceValueFetcher.identity` to pass through the source value untouched.
* `SourceValueFetcher.toString` to simply convert the source value to a string.
2020-10-08 11:58:36 -07:00
Julie Tibshirani cc09b6b6a0
Make array value parsing flag more robust. (#63354)
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.

This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.

Follow-up to #62974.
2020-10-06 14:42:03 -07:00
Alan Woodward ce649d07d7
Move FieldMapper#valueFetcher to MappedFieldType (#62974)
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
2020-10-04 10:47:04 +01:00
Luca Cavanna daade44174
Share same existsQuery impl throughout mappers (#57607)
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.

There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.

This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.

At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
2020-09-23 08:58:09 +02:00
Luca Cavanna 3a9b65733c
Move stored flag from TextSearchInfo to MappedFieldType (#62717) 2020-09-22 15:41:24 +02:00