Commit Graph

34 Commits

Author SHA1 Message Date
Jim Ferenczi 3b62de9e78
unsigned longs should be compatible with index sorting (#75599)
This change allows the type `unsigned_long` to be used in the index
sort specification. The internal representation uses plain longs so
we can rely on the existing support for that type.
2021-07-26 16:45:47 +02:00
Luca Cavanna c6641bf00c
Rename ParseContext to DocumentParserContext (#74963)
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.

To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
2021-07-06 09:15:59 -04:00
Rory Hunter a5d2251064
Order imports when reformatting (#74059)
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.

The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.

I've also added a config file for the formatter plugin.

Other changes:
   * I've quietly enabled the `toggleOnOff` option for Spotless. It was
     already possible to disable formatting for sections using the markers
     for docs snippets, so enabling this option just accepts this reality
     and makes it possible via `formatter:off` and `formatter:on` without
     the restrictions around line length. It should still only be used as
     a very last resort and with good reason.
   * I've removed mention of the `paddedCell` option from the contributing
     guide, since I haven't had to use that option for a very long time. I
     moved the docs to the spotless config.
2021-06-16 09:22:22 +01:00
Rene Groeschke e609e07cfe
Remove internal build logic from public build tool plugins (#72470)
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic

This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.

It also introduces a set of internal versions of public plugins.

As part of this we also generate the plugin descriptors now.

As a follow up on this we can actually move these public used classes into 
an extra project (declared as included build)

We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
2021-05-06 14:02:35 +02:00
Alan Woodward b27eaa38dc
Remove 'external values', and replace with swapped out XContentParsers (#72203)
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
2021-04-29 09:17:18 +01:00
Alan Woodward e002aa809b
Make FieldNamesFieldMapper responsible for adding its own doc fields (#71929)
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.

This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
2021-04-27 16:03:46 +01:00
Alan Woodward 1653f2fe91
Add script parameter to long and double field mappers (#69531)
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
2021-03-31 11:14:11 +01:00
Nik Everett 91c700bd99
Super randomized tests for fetch fields API (#70278)
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
2021-03-24 14:16:27 -04:00
Alan Woodward 139ff8657a
Require `meta` field for MappedFieldType to be non-null (#70145)
The transport action for FieldCapabilities assumes the meta field for a MappedFieldType
is traversable. This commit adds a requirement to MappedFieldType itself to ensure that
it is implemented for all subtypes.
2021-03-09 15:40:03 +00:00
Rene Groeschke bdf229a148
Introduce Internal Test Artifact Plugin (#68766)
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow

Test artifact resolution is now variant aware which allows us a more adequate 
compile and runtime classpath for the consuming projects.

We also Introduce a convention method in the elasticsearch build to declare 
test artifact dependencies in an easy way close to how its done by the gradle build in 
test fixture plugin.

Furthermore we cleaned up some inconsistent test dependencies declarations when 
relying on a project and on its test artifacts
2021-02-16 14:36:17 +01:00
Alan Woodward dbff7bea37
Rename DocValueFetcher.Leaf to FormattedDocValues (#68818)
Also moves it to a top-level interface in fielddata. It is not only used by
DocValueFetcher any more, and Leaf does not really describe what
it does or what it provides.
2021-02-15 10:03:25 +00:00
Rene Groeschke 5dfa6f46ac
Remove deprecated usage of default configuration (#68575)
This has been deprecated in gradle before but we havnt been warned.

Gradle 7.0 will likely introduce a change in behaviour here that we
should fix the usage of this configuration upfront.

See https://github.com/gradle/gradle/issues/16027 for further information
about the change in Gradle 7.0
2021-02-07 12:08:02 +01:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
gf2121 92f85981a7
Avoid duplicate serialization for TermsQueryBuilder (#67223)
Avoid duplicate serialization for TermsQuery.
2021-01-18 09:04:29 +01:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Julie Tibshirani f4a462d05e
Simplify how source is passed to fetch subphases. (#65292)
This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:
* For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used.

Addresses #62511.
2020-11-20 14:09:41 -08:00
Nik Everett 7ceed1369d
Speed up date_histogram without children (#63643)
This speeds up `date_histogram` aggregations without a parent or
children. This is quite common - it's the aggregation that Kibana's Discover
uses all over the place. Also, we hope to be able to use the same
mechanism to speed aggs with children one day, but that day isn't today.

The kind of speedup we're seeing is fairly substantial in many cases:
```
|                              |                                            |  before |   after |    |
| 90th percentile service time |           date_histogram_calendar_interval | 9266.07 | 1376.13 | ms |
| 90th percentile service time |   date_histogram_calendar_interval_with_tz | 9217.21 | 1372.67 | ms |
| 90th percentile service time |              date_histogram_fixed_interval | 8817.36 | 1312.67 | ms |
| 90th percentile service time |      date_histogram_fixed_interval_with_tz | 8801.71 | 1311.69 | ms | <-- discover's agg
| 90th percentile service time | date_histogram_fixed_interval_with_metrics | 44660.2 | 43789.5 | ms |
```

This uses the work we did in #61467 to precompute the rounding points for
a `date_histogram`. Now, when we know the rounding points we execute the
`date_histogram` as a `range` aggregation. This is nice for two reasons:
1. We can further rewrite the `range` aggregation (see below)
2. We don't need to allocate a hash to convert rounding points
   to ordinals.
3. We can send precise cardinality estimates to sub-aggs.

Points 2 and 3 above are nice, but most of the speed difference comes from
point 1. Specifically, we now look into executing `range` aggregations as
a `filters` aggregation. Normally the `filters` aggregation is quite slow
but when it doesn't have a parent or any children then we can execute it
"filter by filter" which is significantly faster. So fast, in fact, that
it is faster than the original `date_histogram`.

The `range` aggregation is *fairly* careful in how it rewrites, giving up
on the `filters` aggregation if it won't collect "filter by filter" and
falling back to its original execution mechanism.


So an aggregation like this:

```
POST _search
{
  "size": 0,
  "query": {
    "range": {
      "dropoff_datetime": {
        "gte": "2015-01-01 00:00:00",
        "lt": "2016-01-01 00:00:00"
      }
    }
  },
  "aggs": {
    "dropoffs_over_time": {
      "date_histogram": {
        "field": "dropoff_datetime",
        "fixed_interval": "60d",
        "time_zone": "America/New_York"
      }
    }
  }
}
```

is executed like:

```
POST _search
{
  "size": 0,
  "query": {
    "range": {
      "dropoff_datetime": {
        "gte": "2015-01-01 00:00:00",
        "lt": "2016-01-01 00:00:00"
      }
    }
  },
  "aggs": {
    "dropoffs_over_time": {
      "range": {
        "field": "dropoff_datetime",
        "ranges": [
          {"from": 1415250000000, "to": 1420434000000},
          {"from": 1420434000000, "to": 1425618000000},
          {"from": 1425618000000, "to": 1430798400000},
          {"from": 1430798400000, "to": 1435982400000},
          {"from": 1435982400000, "to": 1441166400000},
          {"from": 1441166400000, "to": 1446350400000},
          {"from": 1446350400000, "to": 1451538000000},
          {"from": 1451538000000}
        ]
      }
    }
  }
}
```

Which in turn is executed like this:

```
POST _search
{
  "size": 0,
  "query": {
    "range": {
      "dropoff_datetime": {
        "gte": "2015-01-01 00:00:00",
        "lt": "2016-01-01 00:00:00"
      }
    }
  },
  "aggs": {
    "dropoffs_over_time": {
      "filters": {
        "filters": {
          "1": {"range": {"dropoff_datetime": {"gte": "2014-12-30 00:00:00", "lt": "2015-01-05 05:00:00"}}},
          "2": {"range": {"dropoff_datetime": {"gte": "2015-01-05 05:00:00", "lt": "2015-03-06 05:00:00"}}},
          "3": {"range": {"dropoff_datetime": {"gte": "2015-03-06 00:00:00", "lt": "2015-05-05 00:00:00"}}},
          "4": {"range": {"dropoff_datetime": {"gte": "2015-05-05 00:00:00", "lt": "2015-07-04 00:00:00"}}},
          "5": {"range": {"dropoff_datetime": {"gte": "2015-07-04 00:00:00", "lt": "2015-09-02 00:00:00"}}},
          "6": {"range": {"dropoff_datetime": {"gte": "2015-09-02 00:00:00", "lt": "2015-11-01 00:00:00"}}},
          "7": {"range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00", "lt": "2015-12-31 00:00:00"}}},
          "8": {"range": {"dropoff_datetime": {"gte": "2015-12-31 00:00:00"}}}
        }
      }
    }
  }
}
```

And *that* is faster because we can execute it "filter by filter".

Finally, notice the `range` query filtering the data. That is required for
the data set that I'm using for testing. The "filter by filter" collection
mechanism for the `filters` agg needs special case handling when the query
is a `range` query and the filter is a `range` query and they are both on
the same field. That special case handling "merges" the range query.
Without it "filter by filter" collection is substantially slower. Its still
quite a bit quicker than the standard `filter` collection, but not nearly
as fast as it could be.
2020-11-09 14:20:25 -05:00
Alan Woodward 0fd70ae383
Remove Mapper.BuilderContext (#64625)
Mapper.BuilderContext is a simple wrapper around two objects, some
IndexSettings and a ContentPath. The IndexSettings are the same as
those provided in the ParserContext, so we can simplify things here
by removing them and just passing ContentPath directly to
Mapper.Builder#build()
2020-11-05 10:48:39 +00:00
Luca Cavanna 344ad33a16
Remove ValueFetcher depedendency from MapperService (#64524)
The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService.

Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.
2020-11-04 12:08:34 +01:00
Mayya Sharipova 0ffbcd3b3c
Disable using unsigned_long in scripts (#64523)
Relates to #64361
2020-11-03 14:20:46 -05:00
Alan Woodward a5168572d5
Collapse ParametrizedFieldMapper into FieldMapper (#64365)
Now that all our FieldMapper implementations extend ParametrizedFieldMapper,
we can collapse the two classes together, and remove a load of cruft from
FieldMapper that is unused. In particular:

* we no longer need the lucene FieldType field on FieldMapper
* we no longer use clone() for merging, so we can remove it from all impls
* the serialization code in FieldMapper that assumes we're looking at text fields can go
2020-11-02 15:07:52 +00:00
Alan Woodward 4191c72baf
Distinguish between simple matches with and without the terms index (#63945)
We currently use TextSearchInfo to let query parsers know when a field
will support match queries. Some field types (numeric, constant, range)
can produce simple match queries that don't use the terms index, and
it is useful to distinguish between these fields on the one hand and
keyword/text-type fields on the other. In particular, the
SignificantTextAggregation only works on fields that have indexed terms,
but there is currently no efficient way to see this at search time and so
the factory falls back on checking to see if an index analyzer has been
defined, with the result that some nonsensical field types are permitted.

This commit adds a new static TextSearchInfo implementation called
SIMPLE_MATCH_WITHOUT_TERMS that can be returned by field
types with no corresponding terms index. It changes significant text
to check for this rather than for the presence of an index analyzer.

This is a breaking change, in that the significant text agg will now throw
an error up-front if you try and apply it to a numeric field, whereas before
you would get an empty result.
2020-10-27 12:07:51 +00:00
Mayya Sharipova 072da36edc
Fix max/min aggs for unsigned_long (#63904)
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.

This patch fixes method pointReaderIfPossible to produce
correct original values.

Relates to #60050
2020-10-19 15:52:27 -04:00
Mayya Sharipova a11f2ae9f2
Modify unsigned_long tests for broader ranges (#63782)
UnsignedLongTests for the range agg was  using very specific intervals
that double type can not distinguish due to lack of precision:

9.223372036854776000E18 == 9.223372036854775807E18 returns true

If we add the corresponding range query test, it will return different
number of hits than the range agg, as range query unlike range agg
doesn't convert valued to double type, and hence more precise.

This patch make broader ranges for the range agg test (so values
converted to doubles don't loose precision), and hence corresponding
range query will return the same number of hits.

Relates to #60050
2020-10-16 12:08:32 -04:00
Mayya Sharipova 3d3837da14
Enable collapse on unsigned_long field (#63495)
Collapse was not working on unsigned_long field,
as collapsing was enabled only on KeywordFieldType and NumberFieldType.

This introduces a new method `collapseType` to MappedFieldType,
that is checked to decide if collapsing should be enabled.

Relates to #60050
2020-10-16 10:06:35 -04:00
Luca Cavanna f491422e1e
Ensure field types consistency on supporting text queries (#63487)
Some supported field types don't support term queries, and throw exception in their termQuery method. That exception is either an IllegalArgumentException or a QueryShardException. There is logic in MatchQuery that skips the field or not depending on the exception that is thrown.

Also, such field types should hold a TextSearchInfo.NONE while that is not always the case.

With this commit we make the following changes:

- streamline using TextSearchInfo.NONE in all field types that don't support text queries
- standardize the exception being thrown when a field type does not support term queries to be IllegalArgumentException. Note that this is not a breaking change as both exceptions previously returned translated to 400 status code.
- Adapt the MatchQuery logic to skip fields that don't support term queries. There is no need to call termQuery passing an empty string and catch exceptions potentially thrown. We can rather check the TextSearchInfo which tells already whether the field supports text queries or not.
- add a test method to MapperTestCase that verifies the consistency of a field type by verifying that it is not searchable whenever it uses TextSearchInfo.NONE, while it is otherwise. This is what triggered all of the above changes.
2020-10-13 11:05:43 +02:00
Julie Tibshirani cc09b6b6a0
Make array value parsing flag more robust. (#63354)
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.

This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.

Follow-up to #62974.
2020-10-06 14:42:03 -07:00
Mayya Sharipova c45724079c
Fix fields retrieval on unsinged_long field (#63119)
This fixes fields retrieval on unsigned_long field

1) For docvalue_fields a custom UnsignedLongLeafFieldData::getLeafValueFetcher
is implemented that correctly retrieves doc values.

2) For stored fields, an error was fixed in UnsignedLongFieldMapper
 how stored values were stored. Before they were incorrectly
stored in the shifted format, now they are stored as original
values in String format.

Relates to #60050
2020-10-06 05:44:50 -04:00
Luca Cavanna ac93ca1819
Remove MapperService argument from IndexFieldData.Builder#build (#63197)
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way. There was another usage that recently went away with the removal of `TypeFieldMapper`.
2020-10-05 11:45:31 +02:00
Alan Woodward ce649d07d7
Move FieldMapper#valueFetcher to MappedFieldType (#62974)
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
2020-10-04 10:47:04 +01:00
Mayya Sharipova 40d528995e
Small refactoring of unsinged_long (#62904)
- Modify Lucene::readSortValue to read BigInteger as a sortValue.
In #60050 writeSortValue was modified, but readSortValue was forgotten.

- Adjust yml tests to v7.10 asr unsigned_long has been backported

- Change MapperTests to use proper document IDs

Relates to #60050
2020-09-30 08:52:26 -04:00
Mayya Sharipova 0219ac39b6
Fix UnsignedLongTests test failure (#63056)
Test testSortDifferentFormatsShouldFail was occasionally failing for
2 reasons:
1) Documents on "idx2" were not available for search before a
search request started
2) Running a test multiple times was causing
occasional ResourceAlreadyExistsException for idx2,
as idx2 was not deleted for a test.

This patch makes the following fixes:
1) Sets up immediate refresh policy for docs in the index"idx2"
2) Creates an index idx2 only once per cluster

Closes: #62997
2020-09-30 08:26:46 -04:00
Alan Woodward 118fa77a31
Add parameter update and conflict tests to MapperTestCase (#62828)
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.

Fixes #61631
2020-09-24 19:39:44 +01:00
Mayya Sharipova ff55296f7a
Introduce 64-bit unsigned long field type (#60050)
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
  to double and can be imprecise for large values.

Closes #32434
2020-09-23 12:06:21 -04:00