As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Since the "*,-*" pattern resolves to "no indices", it makes a normally
destructive action into a non-destructive one. Rather than throwing a
wildcards-not-allowed exception, we can allow this pattern to pass
without triggering an exception. This allows the security layer to
safely use a "*,-*" pattern to indicate a "no indices" result for its
index resolution step, which is important because otherwise we get
wildcards-not-allowed exceptions when trying to delete nonexistent
concrete indices. For simplicity, we require exactly "*,-*", rather than
any other wildcards that might be logically equivalent.
This commit mostly reverts #67934, except for the change to the version
constant `REPOSITORY_UUID_IN_REPO_DATA_VERSION`.
Completes the backport of #67829 via #67899
This commit suppresses any BWC tests related to snapshots in `master` so
that #67899 can be merged to `7.x`. It will mostly be reverted after the
merge of #67899 is complete.
Relates #66431
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.
This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.
This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.
Part of the fixes for #66419, this commit permits nodes to emit the
deprecation warning regarding not specifying `?wait_for_active_shards`
when closing an index in 7.x versions for x ≥ 12. This change is
required on `master` too since the BWC tests encounter these warnings.
Relates #67246, which is the 7.x part of this change.
* Adds a minimum version request parameter to SearchRequest.
The minimum version helps failing a request if any shards
involved in the search do not meet the compatibility requirements
(all shards need to have a version equal or later than the minimum
version provided).
In 7.x the close indices API defaulted to `?wait_for_active_shards=0`
but from 8.0 it defaults to respecting the index settings instead. This
commit introduces the `index-setting` value for this parameter on this
API allowing users to opt-in to the future behaviour today, and emits a
deprecation warning indicating that the default no longer needs to be
used and will be unsupported in future.
In 7.x a follow up PR will introduce support for the same
`index-setting` value for this parameter and will emit deprecation
warnings if users try and use the default instead.
Relates #66419
When I merged #67043 it had an integration test for the thing it was
fixing but it still fails in the bwc tests. Yikes! I should know better
but life is life. Anyway, this updates the skip to ignore the test for
now. I'll reenable once the backport is in.
Fixes a bug where nested documents that match a filter in the `filters`
agg will be counted as matching the filter. Usually nested documents
only match if you explicitly ask to match them. Worse, we only mach them
in the "filter by filter" mode that we wrote to speed up date_histogram.
The `filters` agg is fairly rare, but with #63643 we run
`date_histogram` and `range` aggregations using `filters.
We started passing down the root document's _source when processing
nested hits, to avoid reloading and reparsing the root source for each hit.
Unfortunately the approach did not work when there are multiple layers of
`inner_hits`. In this case, the second-layer inner hit received its immediate
parent's source instead of the root source. This parent source is filtered to
just contain the parts corresponding to the nested document, but the source
parsing logic is designed to always operate on the top-level root source. This
caused failures when loading the second-layer inner hits.
This PR makes sure to always pass the root document's _source when processing
inner hits, even if there are multiple layers.
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the same mapping is used for
the entire "query phase" of the search.
Closes#62033
This PR fixes two bugs that can arise when _source is disabled and we fetch nested documents:
* Fix exception when highlighting `inner_hits` with disabled _source.
* Fix exception in nested `top_hits` with disabled _source.
* Add more tests for highlighting `inner_hits`.
This commit introduces a new sort field called `_shard_doc` that
can be used in conjunction with a PIT to consistently tiebreak
identical sort values. The sort value is a numeric long that is
composed of the ordinal of the shard (assigned by the coordinating node)
and the internal Lucene document ID. These two values are consistent within
a PIT so this sort criteria can be used as the tiebreaker of any search
requests.
Since this sort criteria is stable we'd like to add it automatically to any
sorted search requests that use a PIT but we also need to expose it explicitly
in order to be able to:
* Reverse the order of the tiebreaking, useful to search "before" `search_after`.
* Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8.
I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big.
Relates #56828
Closes#66107.
Bootstrap plugins are not loaded in the main Elasticsearch process, but
instead take effect only when ES is starting. As such, these plugins are
skipped when ES loads all installed plugins.
As a result, it was impossible for the plugins _cat API to report
whether any bootstrap plugins are installed.
Fix this by adjusting how the loading process skips bootstrap plugins,
and then tweaking the plugins _cat API so that bootstrap plugins can
optionally be included in the response.
In general, we can't guarantee that a match_all query will return documents in
the order they were indexed. This PR adds an ID to each document to avoid
relying on document order.
This commit fixes the cat tasks api parameter specification and the
handler so that the parameters are consumed during request preparation.
Closes#59493
This commit removes the reference to the suggest index metric in the
nodes stats and indices stats rest api spec files. Suggest has been
removed so it is no longer correct to have it in these files.
Closes#43407
This PR fixes a regression where fvh fragments could be loaded from the wrong
document _source.
Some `FragmentsBuilder` implementations contain a `SourceLookup` to load from
_source. The lookup should be positioned to load from the current hit document.
However, since `FragmentsBuilder` are cached and shared across hits, the lookup
is never updated to load from the new documents. This means we accidentally
load _source from a different document.
The regression was introduced in #60179, which started storing `SourceLookup`
on `FragmentsBuilder`.
Fixes#65533.
Currently we don't have a way to selectively mute REST YAML tests
in FIPS 140 mode.
This commit introduces a new feature (fips_140) that can be used
in skip blocks to allow that.
`date_histogram` has a bug with `offset` and `extended_bounds` when it
needs to create an "empty" aggregation result: it includes the bounds
twice! Wooops!
I broke this a while back when I started trying to merge `offset` into
`Rounding`. I never finished that merge, sadly. Interestingly, we've
discovered that the merge is required to properly handle daylight
savings time (#56305) but it isn't really something we're looking to
solve today. For now, this just stops counting the offset twice.
Closes#65624
Our test framework randomly sends the body of requests over the `source`
parameter. We never send the body if it is more than 2000 bytes becuse
our HTTP receiver can't handle lines more the 4096 bytes. The thing is,
when we do that 2000 bytes check we do it against the reported length of
body, not the body after it has been url encoded. Than url encoding
strings can *vastly* increase their size. Which could cause us to send
some request over the URL that are longer than 4096 bytes.
This fixes that by checking the url encoded length as well. We keep the
2000 byte check of the unencoded length because it is a nice fast check,
even if it is a bit inaccurate.
Closes#65718
CoreTestTranslator re-writes some core search yaml tests into a tests
for runtime queries, and mimics source-only fields by building ad-hoc
painless scripts for each runtime type. We now have source-only fields
built in via scriptless mappings, so we can cut over to using these
instead.
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.
Closes#63690
Currently, if you write a date range query with numeric 'to' or 'from' bounds,
they can be interpreted as years if no format is provided. We use
"strict_date_optional_time||epoch_millis" in this case that can interpret inputs
like 1000 as the year 1000 for example.
This PR change this to always interpret and parse numbers with the "epoch_millis"
parser if no other formatter was provided.
Closes#63680
This ports the majority of the rest integ tests tasks to use the task avoidance api.
- There are some edge cases left that we need to investigate, but we can do that separately.
Currently an error in a `_mtermvectors`, for example because querying through an
alias that has several indices assigned to it, fails the whole request. Instead
we should only fail the problematic item in the multi item request, like we e.g.
do in same situations in _mget.