Commit Graph

169 Commits

Author SHA1 Message Date
Armin Braun 096b8ccc26
Fix TextFieldMapper Retaining a Reference to its Builder (#77251)
Fixes the text field mapper and the analyzers class that also retained parameter references that go really heavy.
Makes `TextFieldMapper` take hundreds of bytes compared to multiple kb per instance.

closes #73845
2021-09-03 18:44:11 +02:00
Armin Braun 38faeefd85
Fix MatchOnlyTextFieldMapper Retaining a Reference to its Builder (#77201)
Just like #77131 but for the `MatchOnlyTextFieldMapper`. Also, cleaned up a few
other minor things in it to make the constructor code for this class easier to follow.
2021-09-03 10:43:40 +02:00
Nik Everett 429beba517
Centralize doc values checking (#77089)
This adds two utility methods for to validate the parameters to the
`docValueFormat` method and replaces a pile of copy and pasted code with
calls to them. They just emit a standard error message if the any
unsupported parameters are provided.
2021-09-01 09:06:24 -04:00
Christos Soulios 707dd497e4
Add multiple validators to Parameters (#77073)
This PR implements support for multiple validators to a FieldMapper.Parameter.

The Parameter#setValidator method was replaced by Parameter#addValidator that can be called multipled times
to add validation to a parameter.

All validators of a parameter will be executed in the same order as they have been added and if any of them fails all validation will failed.
2021-08-31 21:28:14 +03:00
Rene Groeschke 35ec6f348c
Introduce simple public yaml-rest-test plugin (#76554)
This introduces a basic public yaml rest test plugin that is supposed to be used by external 
elasticsearch plugin authors. This is driven by #76215

- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins

Co-authored-by: Mark Vieira <portugee@gmail.com>
2021-08-31 08:45:52 +02:00
Luca Cavanna c6641bf00c
Rename ParseContext to DocumentParserContext (#74963)
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.

To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
2021-07-06 09:15:59 -04:00
Christos Soulios df941367df
Add dimension mapping parameter (#74450)
Added the dimension parameter to the following field types:

    keyword
    ip
    Numeric field types (integer, long, byte, short)

The dimension parameter is of type boolean (default: false) and is used 
to mark that a field is a time series dimension field.

Relates to #74014
2021-06-24 20:16:27 +03:00
Luca Cavanna 7cedc3ec3a
Make Document a top-level class (#74472)
There is no reason for Document to be an inner class of ParseContext, especially as it is public and accessed directly from many different places.

This commit takes it out to its own top-level class file, which has the advantage of simplifying ParseContext which could use some love too.
2021-06-24 10:56:30 +02:00
Adrien Grand a90ca8dfae
Prevent Lucene from automatically flushing in SourceIntervalsSourceTests. (#73990)
Closes #72130
2021-06-10 13:44:28 +02:00
Ryan Ernst ab1a2e4a84
Add precommit task for detecting split packages (#73784)
Modularization of the JDK has been ongoing for several years. Recently
in Java 16 the JDK began enforcing module boundaries by default. While
Elasticsearch does not yet use the module system directly, there are
some side effects even for those projects not modularized (eg #73517).
Before we can even begin to think about how to modularize, we must
Prepare The Way by enforcing packages only exist in a single jar file,
since the module system does not allow packages to coexist in multiple
modules.

This commit adds a precommit check to the build which detects split
packages. The expectation is that we will add the existing split
packages to the ignore list so that any new classes will not exacerbate
the problem, and the work to cleanup these split packages can be
parallelized.

relates #73525
2021-06-08 15:04:23 -07:00
Ryan Ernst 63012c8a40
Move ParseField to o.e.c.xcontent (#73923)
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.

relates #73784
2021-06-08 13:32:14 -07:00
Alan Woodward 009f23e7a9
Explicitly say if stored fields aren't supported in MapperTestCase (#72474)
MapperTestCase has a check that if a field mapper supports stored fields,
those stored fields are available to index time scripts. Many of our mappers
do not support stored fields, and we try and catch this with an assumeFalse
so that those mappers do not run this test. However, this test is fragile - it
does not work for mappers created with an index version below 8.0, and it
misses mappers that always store their values, e.g. match_only_text.

This commit adds a new supportsStoredField method to MapperTestCase,
and overrides it for those mappers that do not support storing values. It
also adds a minimalStoredMapping method that defaults to the minimal
mapping plus a store parameter, which is overridden by match_only_text
because storing is not configurable and always available on this mapper.
2021-04-30 08:59:56 +01:00
Adrien Grand 1f74b2072f
Enable mixed-version cluster tests for `match_only_text`. (#72102)
These tests can be enabled now that the new field type has been backported to
7.14.
2021-04-29 16:41:39 +02:00
Alan Woodward b27eaa38dc
Remove 'external values', and replace with swapped out XContentParsers (#72203)
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
2021-04-29 09:17:18 +01:00
Alan Woodward e002aa809b
Make FieldNamesFieldMapper responsible for adding its own doc fields (#71929)
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.

This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
2021-04-27 16:03:46 +01:00
Adrien Grand 314574026a Make sure there are no merges.
Closes #72130
2021-04-23 14:17:46 +02:00
Luca Cavanna 1d514c53cb
Remove MapperService#parse method (#72080)
We have recently split DocumentMapper creation from parsing Mapping. There was one method leftover that exposed parsing mapping into DocumentMapper, which is generally not needed. Either you only need to parse into a Mapping instance, which is more lightweight, or like in some tests you need to apply a mapping update for which you merge new mappings and get the resulting document mapper. This commit addresses this and removes the method.
2021-04-22 16:08:34 +02:00
Adrien Grand 83113ec8d3
Add `match_only_text`, a space-efficient variant of `text`. (#66172)
This adds a new `match_only_text` field, which indexes the same data as a `text`
field that has `index_options: docs` and `norms: false` and uses the `_source`
for positional queries like `match_phrase`. Unlike `text`, this field doesn't
support scoring.
2021-04-22 08:41:47 +02:00
Alan Woodward 72f9c4c122
Add null-field checks to shape field mappers (#71999)
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.

Fixes #71874
2021-04-21 15:54:22 +01:00
Luca Cavanna 1469e18c98
Add support for script parameter to boolean field mapper (#71454)
Relates to #68984
2021-04-12 10:04:12 +02:00
Jake Landis 279fde375e
Apply REST API compatibility testing for the :modules (#71137) 2021-04-02 11:20:54 -05:00
Alan Woodward 1653f2fe91
Add script parameter to long and double field mappers (#69531)
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
2021-03-31 11:14:11 +01:00
Mark Vieira 6339691fe3
Consolidate REST API specifications and publish under Apache 2.0 license (#70036) 2021-03-26 16:20:14 -07:00
Nik Everett 91c700bd99
Super randomized tests for fetch fields API (#70278)
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
2021-03-24 14:16:27 -04:00
Mayya Sharipova 1de0b616eb
Add positive_score_impact to rank_features type (#69994)
rank_features field type misses positive_score_impact parameter
that rank_feature type has. This adds this parameter.

Closes #68619
2021-03-10 14:55:54 -05:00
Alan Woodward 8fba6e4a6d
Handle ignored fields directly in SourceValueFetcher (#68738)
Currently, the value fetcher framework handles ignored fields by reading
the stored values of the _ignored metadata field, and passing these through
on calls to fetchValues(). However, this means that if a document has multiple
values indexed for a field, and one malformed value, then the fields API will
ignore everything, including the valid values, and return an empty list for this
document.

If a document source contains a malformed value, then it must have been
ignored at index time. Therefore, we can safely assume that if we get an
exception parsing values from source at fetch time, they were also ignored
at index time and they can be skipped. This commit moves this exception
handling directly into SourceValueFetcher and ArraySourceValueFetcher,
removing the need to inspect the _ignored metadata and fixing the case
of mixed valid and invalid values.
2021-02-16 15:19:15 +00:00
Alan Woodward dbff7bea37
Rename DocValueFetcher.Leaf to FormattedDocValues (#68818)
Also moves it to a top-level interface in fielddata. It is not only used by
DocValueFetcher any more, and Leaf does not really describe what
it does or what it provides.
2021-02-15 10:03:25 +00:00
Christoph Büscher e2d5183af0
Return structured nested data in ‘fields’ API
At the moment, the ‘fields’ API handles nested fields the same way I handles non-nested object arrays: it just returns them in a flat list. However, the relationship between nested fields is something we should try to preserve, since this is the main purpose of mapping something as “nested” instead of just using an object.

This PR changes this by returning grouped field values that are inside a nested object according to the nested object they initially appear in. Any further object structures inside a nested object are again returned as a flattened list. Fields inside nested fields don’t appear in the flattened response outside of the nested path any more. The grouping of fields inside nested objects is applied recursively if nested mappings are defined inside another nested mapping.

Closes #63709
2021-02-05 11:05:03 +01:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Mayya Sharipova 76482210b8
Add linear function to rank_feature query (#67438)
This adds a linear function to the set of functions available
for rank_feature query

Closes #49859
2021-01-18 11:44:13 -05:00
gf2121 92f85981a7
Avoid duplicate serialization for TermsQueryBuilder (#67223)
Avoid duplicate serialization for TermsQuery.
2021-01-18 09:04:29 +01:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Jim Ferenczi 6d1f43c6d2
Fix search_as_you_type field with term_vector (#66432)
This commit fixes a bug in the search_as_you_type field that was introduced during
the refactoring of the field mapper. The prefix field that is used internally
by the search_as_you_type mapper doesn't need term vector even if they are activated
on the main field. So this commit ensures that we don't copy the options from the main
field when we create the prefix sub-field.

Closes #66407
2020-12-16 17:04:51 +01:00
Jake Landis c35d7c0f5a
Convert module/mappers-extra to an internal cluster test (#65971)
modules/mappers-extra should be an internal cluster, not
a javaRestTest. This test will work correctly until you
you try to modify the javaRestTest test cluster. Then
it will treat the javaRestTest as an external cluster
to it's own test cluster potentially causing issues with
tests.
2020-12-08 07:59:27 -06:00
Alan Woodward 1a8ce8716d
Restore use of default search and search_quote analyzers (#65491)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 16:57:45 +00:00
Julie Tibshirani f4a462d05e
Simplify how source is passed to fetch subphases. (#65292)
This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:
* For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used.

Addresses #62511.
2020-11-20 14:09:41 -08:00
Alan Woodward 9543d3b432
Fix UOE when building exists query for nested search-as-you-type field (#64630)
PrefixFieldType can use the default existsQuery() implementation.

Fixes #64609
2020-11-05 12:08:39 +00:00
Alan Woodward 0fd70ae383
Remove Mapper.BuilderContext (#64625)
Mapper.BuilderContext is a simple wrapper around two objects, some
IndexSettings and a ContentPath. The IndexSettings are the same as
those provided in the ParserContext, so we can simplify things here
by removing them and just passing ContentPath directly to
Mapper.Builder#build()
2020-11-05 10:48:39 +00:00
Alan Woodward f010269ab7
Move index analyzer management to FieldMapper/MapperService (#63937)
Index-time analyzers are currently specified on the MappedFieldType. This
has a number of unfortunate consequences; for example, field mappers that
index data into implementation sub-fields, such as prefix or phrase
accelerators on text fields, need to expose these sub-fields as MappedFieldTypes,
which means that they then appear in field caps, are externally searchable,
etc. It also adds index-time logic to a class that should only be concerned
with search-time behaviour.

This commit removes references to the index analyzer from MappedFieldType.
Instead, FieldMappers that use the terms index can pass either a single analyzer
or a Map of fields to analyzers to their super constructor, which are then
exposed via a new FieldMapper#indexAnalyzers() method; all index-time analysis 
is mediated through the delegating analyzer wrapper on MapperService. 
In a follow-up, this will make it possible to register multiple field analyzers from 
a single FieldMapper, removing the need for 'hidden' mapper implementations 
on text field, parent joins, and elsewhere.
2020-11-04 13:53:09 +00:00
Luca Cavanna 344ad33a16
Remove ValueFetcher depedendency from MapperService (#64524)
The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService.

Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.
2020-11-04 12:08:34 +01:00
Alan Woodward a5168572d5
Collapse ParametrizedFieldMapper into FieldMapper (#64365)
Now that all our FieldMapper implementations extend ParametrizedFieldMapper,
we can collapse the two classes together, and remove a load of cruft from
FieldMapper that is unused. In particular:

* we no longer need the lucene FieldType field on FieldMapper
* we no longer use clone() for merging, so we can remove it from all impls
* the serialization code in FieldMapper that assumes we're looking at text fields can go
2020-11-02 15:07:52 +00:00
Alan Woodward 4191c72baf
Distinguish between simple matches with and without the terms index (#63945)
We currently use TextSearchInfo to let query parsers know when a field
will support match queries. Some field types (numeric, constant, range)
can produce simple match queries that don't use the terms index, and
it is useful to distinguish between these fields on the one hand and
keyword/text-type fields on the other. In particular, the
SignificantTextAggregation only works on fields that have indexed terms,
but there is currently no efficient way to see this at search time and so
the factory falls back on checking to see if an index analyzer has been
defined, with the result that some nonsensical field types are permitted.

This commit adds a new static TextSearchInfo implementation called
SIMPLE_MATCH_WITHOUT_TERMS that can be returned by field
types with no corresponding terms index. It changes significant text
to check for this rather than for the presence of an index analyzer.

This is a breaking change, in that the significant text agg will now throw
an error up-front if you try and apply it to a numeric field, whereas before
you would get an empty result.
2020-10-27 12:07:51 +00:00
Luca Cavanna b96f26eba2
Remove documentMapperParser method from MapperService (#63938)
MapperService allows to retrieve its internal DocumentMapperParser instance. Such method is only used in tests, and always to parse mappings which is already exposed by MapperService through a specific parse method.

This commit removes the getter for DocumentMapperParser from MapperService in favour of calling MapperService#parse
2020-10-20 20:11:29 +02:00
Alan Woodward 8b98af24b4
Remove generics from Mapper.Builder (#63623)
We simplified the generics on Mapper.Builder in #56747, but stopped short
of removing them entirely because they were still used in various places in
the code. Now that most field mappers have been converted to parametrized
form, these generics are no longer useful. There are very few places where
a fluent Builder pattern is used, almost all in tests, and these can all be
replaced with simple casts; in exchange, we remove lots of visual cruft and
clean up a number of warnings.
2020-10-13 17:24:10 +01:00
Luca Cavanna f491422e1e
Ensure field types consistency on supporting text queries (#63487)
Some supported field types don't support term queries, and throw exception in their termQuery method. That exception is either an IllegalArgumentException or a QueryShardException. There is logic in MatchQuery that skips the field or not depending on the exception that is thrown.

Also, such field types should hold a TextSearchInfo.NONE while that is not always the case.

With this commit we make the following changes:

- streamline using TextSearchInfo.NONE in all field types that don't support text queries
- standardize the exception being thrown when a field type does not support term queries to be IllegalArgumentException. Note that this is not a breaking change as both exceptions previously returned translated to 400 status code.
- Adapt the MatchQuery logic to skip fields that don't support term queries. There is no need to call termQuery passing an empty string and catch exceptions potentially thrown. We can rather check the TextSearchInfo which tells already whether the field supports text queries or not.
- add a test method to MapperTestCase that verifies the consistency of a field type by verifying that it is not searchable whenever it uses TextSearchInfo.NONE, while it is otherwise. This is what triggered all of the above changes.
2020-10-13 11:05:43 +02:00
Julie Tibshirani 62857b49d1
Add support for missing value fetchers. (#63515)
This PR implements value fetching for the following field types:
* `text` phrase and prefix subfields
* `search_as_you_type`, plus its subfields
* `token_count`, which is implemented by fetching doc values

Supporting these types helps ensure that retrieving all fields through
`"fields": ["*"]` doesn't fail because of unsupported value fetchers.
2020-10-12 13:57:29 -07:00
Julie Tibshirani 8c56bbc3e6
Add factory methods for common value fetchers. (#63438)
This PR adds factory methods for the most common implementations:
* `SourceValueFetcher.identity` to pass through the source value untouched.
* `SourceValueFetcher.toString` to simply convert the source value to a string.
2020-10-08 11:58:36 -07:00
Luca Cavanna 95582da9a5
Rename QueryShardContext#fieldMapper to getFieldType (#63399)
Given that we have a class called `FieldMapper` and that the `fieldMapper` method exposed by `QueryShardContext` actually allows to get a `MappedFieldType` given its name, this commit renames such method to `getFieldType`
2020-10-07 16:11:53 +02:00
Julie Tibshirani cc09b6b6a0
Make array value parsing flag more robust. (#63354)
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.

This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.

Follow-up to #62974.
2020-10-06 14:42:03 -07:00
Luca Cavanna ac93ca1819
Remove MapperService argument from IndexFieldData.Builder#build (#63197)
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way. There was another usage that recently went away with the removal of `TypeFieldMapper`.
2020-10-05 11:45:31 +02:00