In this commit we extend the Fields API formats for geo data in order to produce vector tiles features directly
on the data nodes. That helps the vector tile API to reduce the size of the data that needs to pull in order to
create the answer.
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.
To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
This change removes the current GeometryFormat interface and replace then with a GeometryParser that deals
with Parsing formats and a GeometryFormatFactory that deals with fields API formats.
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.
relates #73784
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
With the introduction of BKD-based geo shape indexing in #32039, the prefix tree indexing method has
been deprecated. From 8.0.0, it will not be allowed to create new mappings using deprecated parameters.
MapperTestCase has a check that if a field mapper supports stored fields,
those stored fields are available to index time scripts. Many of our mappers
do not support stored fields, and we try and catch this with an assumeFalse
so that those mappers do not run this test. However, this test is fragile - it
does not work for mappers created with an index version below 8.0, and it
misses mappers that always store their values, e.g. match_only_text.
This commit adds a new supportsStoredField method to MapperTestCase,
and overrides it for those mappers that do not support storing values. It
also adds a minimalStoredMapping method that defaults to the minimal
mapping plus a store parameter, which is overridden by match_only_text
because storing is not configurable and always available on this mapper.
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.
This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.
Fixes#71874
With the exception of geo_point, no geometry field mappers actually support
the use of multifields. However, we happily accept multifield definitions on these
mappers and silently ignore them when it comes to indexing documents. With
this commit, we emit deprecation warnings when a multifield definition is included
on a geometry field mapper that does not in fact support one.
This commit adds the ability to define an index-time geo_point field
with a script parameter, allowing you to calculate points from other
values within the indexed document.
The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around ignore_malformed behaviour: geo fields will ignore
all values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.
This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the ignore_malformed behaviour with that of other mappers.
It contains two breaking changes:
* The geo field mappers no longer check for external field values on the
parse context. This added considerable complication to the refactored
parse methods, and is unused anywhere in our codebase, but may
impact plugin-based field mappers which expect to use geo fields
as multifields
* The geo_point field mapper now passes geohashes to its multifields
one-by-one, instead of formatting them into a comma-delimited
string and passing them all at once. Completion multifields using
this as an input should still behave as normal because by default
they would split this combined geohash string on the commas in any
case, but keyword subfields may look different.
Fixes#69601
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.
It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.
We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.
We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.
I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
The transport action for FieldCapabilities assumes the meta field for a MappedFieldType
is traversable. This commit adds a requirement to MappedFieldType itself to ensure that
it is implemented for all subtypes.
This allows many of the optimizations added in #63643 and #68871 to run
on aggregations with sub-aggregations. This should:
* Speed up `terms` aggregations on fields with less than 1000 values that
also have sub-aggregations. Locally I see 2 second searches run in 1.2
seconds.
* Applies that same speedup to `range` and `date_histogram` aggregations but
it feels less impressive because the point range queries are a little
slower to get up and go.
* Massively speed up `filters` aggregations with sub-aggregations that
don't have a `parent` aggregation or collect "other" buckets. Also
save a ton of memory while collecting them.
The ValueFetcher for geo_shape will shortcut the validation of its
source value if it detects that the source format and the requested
format are the same. This worked fine when malformed values were
dealt with by checking the _ignored metadata, but since #68738
we need to always validate source values at fetch time.
This commit removes this special shortcut logic, and adds tests
to check that geo_shape value fetchers do not return malformed
source inputs.
Fixes#69071
The geo_line aggregation incorrectly handled results with missing values. Tests to verify this
behavior were incorrectly checking results. This commit resolves these in three ways
- explicitly testing missing sort field values
- explicitly testing handling missing point field values
- explicitly testing handling mixed scenarios where some documents have one and not the other, or missing both.
Fixes https://github.com/elastic/kibana/issues/90653.
Fixes#69346.
This commit adds support for the Gold+ licensed `geo_line` aggregation.
This aggregation takes a collection of `geo_point` values and constructs a line
according to some sort value. Adding to transforms allows users to create these
potentially expensive lines out of band of visualizations and then do additional aggs/queries
against the pivoted data.
Examples would be:
"Do these daily user paths ever intersect?"
"Does this path enter and leave this area?"
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow
Test artifact resolution is now variant aware which allows us a more adequate
compile and runtime classpath for the consuming projects.
We also Introduce a convention method in the elasticsearch build to declare
test artifact dependencies in an easy way close to how its done by the gradle build in
test fixture plugin.
Furthermore we cleaned up some inconsistent test dependencies declarations when
relying on a project and on its test artifacts