This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each
context can accept the standard script input (source and params) along with a user-defined document
and an index name to pull mappings from. The results depend on the output of the runtime field
type.
Closes#70467
Today we have no tests that directly verify that `SnapshotInfo`
correctly round-trips through its in-repository `XContent`-based
representation. We do have tests that verify that it correctly
round-trips through the wire format, but these tests always use an empty
collection for `featureStates`.
This commit strengthens the wire format round-trip tests to use
nontrivial `featureStates`, and generalizes these tests to also verify
that the `XContent` representation is faithful.
This commit allows you to set 'script' and 'on_script_error' parameters
on date field mappers, meaning that runtime date fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
This commit removes the ability for plugins to add roles. Roles are
fairly tightly coupled with the behavior of the system (as evidenced by
the fact that some roles from the default distribution leaked behavior
into the OSS distribution). We previously had this plugin extension
point so that we could support a difference in the set of roles between
the OSS and default distributions. We no longer need to maintain that
differentiation, and can therefore remove this plugin extension
point. This was technical debt that we were willing to accept to allow
the default distribution to have additional roles, but now we no longer
need to be encumbered with that technical debt.
We recently upgrade to gradle 7.0 (#69096) which turned on module
inference by default. I'm sure that's lovely, but its broken eclipse. We
should probably support modules one day, but that day ain't today. This
turns off module inference for eclipse so it can continue compiling
things just like it did yesterday.
Closes#71648
This change adds the ability to call value on an XContentBuilder and consume a boolean[]. This was
missing from the set of other writers for the unknown value call.
copy_to is currently implemented at document parse time, and does not
work with values generated from index-time scripts. We may want to add
this functionality in future, but for now this commit ensures that we throw
an exception if copy_to and script are both set on a field mapper.
We accept dates with a decimal point like `2113413.13241324` and parse
them *somehow*. But there are cases where we'll lose precision on those
dates, see #70085. This advises folks not to use that format. We'll
continue to accept those dates for backwards compatibility but you
should avoid using them.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
A CounterMetric is used to track the number of completed and outstanding
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.
However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.
We can replace LongAdder with AtomicLong, but this commit chooses to
continue using LongAdder but returns 0 when the sum value is negative.
Relates #52411Closes#70968
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
This change fixes number of problems in GeoIPv2 code:
- closes streams from Files.list in GeoIpCli, which should fix tests on Windows
- makes sure that total download time in GeoIP stats is non-negative (we serialize it as vInt which can cause problems with negative numbers and it can happen when clock was changed during operation)
- fixes handling of failed/simultaneous downloads, #69951 was meant as a way to prevent 2 persistent tasks to index chunks but it would prevent any update if single download failed mid indexing, this change uses timestamp (lastUpdate) as sort of UUID. This should still prevent 2 tasks to step on each other toes (overwriting chunks) but in the end still only single task should be able to update task state (this is handled by persistent tasks framework)
Closes#71145
Now that we have a feature reset API, we should use
this for cleaning up in between tests instead of running
lots of bespoke cleanup code.
During testing of this change we found we need to
delete custom cluster state as part of the reset process,
so this PR also implements that.
Additionally we no longer assign persistent tasks
during feature reset.
The logic for interpreting the primary routing entry was duplicated across
snapshotting and cloning needlessly. Also, dried up determining if there's an active
deletion for a repo since we were doing that operation in a number of spots as well.
* Formatting `SnapshotsInProgress` in a more reasonable way instead of mixing nested class, fields and methods.
* Removing pointless constants for the x-content serialization.
* Cleaning up double wrapping in `unmodifiableList`.
* Removing unused and incorrectly documented constructor from `SnapshotDeletionsInProgress`
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `parent_join` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
This commit allows you to set 'script' and 'on_script_error' parameters
on IP field mappers, meaning that runtime IP fields can be made indexed
simply by moving their definitions from the runtime section of the mappings
to the properties section.
This fixes the `global` aggregator when `profile` is enabled. It does so
by removing all of the special case handling for `global` aggs in
`AggregationPhase` and having the global aggregator itself perform the
scoped collection using the same trick that we use in filter-by-filter
mode of the `filters` aggregation.
Closes#71098
The frozen tier only holds shared cache searchable snapshots. This
commit adds an autoscaling decider that scales the total memory on
the tier adequately to hold the shards. A frozen shard is assigned
a memory size of 64GB/2000, i.e., each 64GB node can hold 2000 shards
before scaling further.
If enabled, the `delete_searchable_snapshot` option will attempt to delete the
index snapshot generated in any previous phase, for the purpose of mounting the
index as a searchable snapshot.
Write out the formatter config using the latest Eclipse. This has the
effect of configuring assertion formatting properly, which has improved
how some of our assertion messsages are formatted. Also reconfigure how
annotations are formatted, so that they are correctly line-wrapped.
We recently added script stats to the existing field type stats. That change is now backported hence master does not need to support the scenario where such info is not available, only the 7.x branch does to account for mixed cluster scenarios.
Relates to #71219
Ensure that the index request is routed to the ingest,
so that the lazy loading occurs of geoip database
on ingest node (which is what is asserted later on)
Otherwise the database is lazy loaded on a different node.
(without this fix, this test fails reproducible with
`-Dtests.seed=2E234CC71CE96F4F`)
Closes#71251
- Update gradle wrapper to gradle 7.0
- Remove deprecated usages to make build 7.0 compatible
- Fix excludes in docs snippet tasks (See https://github.com/gradle/gradle/issues/16160 for details)
- Fix deprecation warnings in 7.0
- Add explicit dependencies that have been missed
- Make extract native licenses tasks output dir more explicit
- Use a snapshot of the ospackage plugin that includes a fix for 7.0 already
- fix test runtime classpath setup in repository-hdfs
- Make task dependency explicit to fix further deprecation warnings
- Remove manual check for http repo usages that has been deprecated in gradle 7.0
- Update spock to latest 2.0 milestone required for groovy 3
This PR makes sure that the file and native realms are always added to the
beginning of the realm chain unless explicitly disabled.
Currently, they are only impliciltly added when:
* No other realms are configured
* No configured realms can be used with current license (so an expired license
can fallback to these basic realms)
A side effect (intended?) is that file and native realm cannot be truely
disabled at all time because the above two rules always apply regardless
whether the realms are disabled or not.
This PR makes the behaviour more explicit. If the file or native realm is
explicitly disabled, it will be disabled at all time. If they are not
explicitly disabled, they will always be added to the beginning of the realm
chain. Two scenarios are possible:
* File or native realm is explicitly configured. In this case, their order
value must be provided and honoured
* File or native realm is not configured. In this case, they are implicitly
added to the beginning of the realm chain (file then native).
This changes remove the loop counter for for each loops which is a regression from 7.9.3
documented in #71584. Not loop counting for each loops should be relatively safe as they have a
natural ending point. We can consider adding the check back in once we have configurable loop
counters.
This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this #71374.
This shrinks a runtime field definition so that it fits on the screen
without scrolling. It also converts the doc into a test so we can be
sure it continues to work.
Relates to #69291
Runtime fields are much more flexible than script_fields because you
can filter and aggregate on them so we hope folks use them! This
converts the example of using a `date_nanos` field in a script to a
runtime field so folks get used to seeing them and hopefully using them.
While I was editing this I took the opportunity to replace the script
with a real-ish example. Scripts that just load the field value are nice
and short but I hope no one uses them in real life because they just add
overhead when compared to accessing the field directly. So I made the
script do something.
Relates to #69291
Co-authored-by: Adam Locke <adam.locke@elastic.co>
For bulk operations that fall back to hotspot intrinsic code (reading short, int, long)
using this stream brings a massive speedup. The added benchmark for reading `long` values
sees a ~100x speedup in local benchmarking and the vLong read benchmark still sees a slightly
under ~10x speedup.
Also, this PR moves creation of the `StreamInput` out of the hot benchmark loop for all the bytes
reference benchmarks to make the benchmark less noisy and more practically useful.
(the `readLong` case using intrinsic operations is so fast that it wouldn't even show up in a profile relative to
instantiating the stream otherwise).
Relates work in #71181
This changes all the script context names specifically for runtime fields to be *_field such as long_field
and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless
execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a
mixed cluster to ensure scripts stored with the old runtime fields context names are both still
retrievable and delete-able. This works because the context name is only used during the request to
check for valid compilation, but never actually stored as part of the cluster state.
IndexFeatureStats prints out a whole object, hence it should be a ToXContentObject. This way consumers like Strings#toString automatically know not to wrap it into a new object when printing it out.
This adds a few tests for runtime field queries applied to
"filter-by-filter" style aggregations. We expect to still be able to
use filter-by-filter aggregations to speed up collection when the top
level query is a runtime field. You'd think that filter-by-filter would
be slow when the top level query is slow, like it is with runtime
fields, but we only run filter-by-filter when we can translate each
aggregation bucket into a quick query. So long as the results of those
queries don't "overlap" we shouldn't end up running the slower top level
query more times than we would during regular collection.
This also adds some javadoc to that effect to the two places where we
chose between filter-by-filter and a "native" aggregation
implementation.