Commit Graph

2893 Commits

Author SHA1 Message Date
Armin Braun 52e7b926a9
Make Large Bulk Snapshot Deletes more Memory Efficient (#72788)
Use an iterator instead of a list when passing around what to delete.
In the case of very large deletes the iterator is a much smaller than
the actual list of files to delete (since we save all the prefixes
which adds up if the individual shard folders contain lots of deletes).
Also this commit as a side-effect adjusts a few spots in logging where the
log messages could be catastrophic in size when trace logging is activated.
2021-05-10 13:40:57 +02:00
Armin Braun bef9dab643
Cleanup BlobPath Class (#72860)
There should be a singleton for the empty version of this.
All the copying to `String[]` or use as an iterator make
no sense either when we can just use the list outright.
2021-05-10 00:10:39 +02:00
Ryan Ernst 8cd3944a0a
Revert "Upgrade Azure SDK and Jackson (#72833)"
This reverts commit dca0e92bef.
2021-05-06 20:51:31 -07:00
Ryan Ernst dca0e92bef
Upgrade Azure SDK and Jackson (#72833)
This commit upgrades the Azure SDK to 12.11.0 and Jackson to 2.12.2. The
Jackson upgrade must happen at the same time due to Azure depending on
this new version of Jackson.

closes #66555
closes #67214
2021-05-06 20:36:42 -07:00
Rene Groeschke e609e07cfe
Remove internal build logic from public build tool plugins (#72470)
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic

This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.

It also introduces a set of internal versions of public plugins.

As part of this we also generate the plugin descriptors now.

As a follow up on this we can actually move these public used classes into 
an extra project (declared as included build)

We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
2021-05-06 14:02:35 +02:00
Armin Braun 0220dfb3fe
Dry up Hashing BytesReference (#72443)
Dries up the efficient way to hash a bytes reference and makes use
of it in a few other spots that were needlessly copying all bytes in
the bytes reference for hashing.
2021-05-06 06:32:52 +02:00
Alan Woodward 69c88119ca
SizeMappingTests shouldn't use ESSingleNodeTestCase (#72564)
This converts the tests to use MapperServiceTestCase, and makes
them a unit test rather than a full integration test. In addition, we
test enabling/disabling the mapper by examining the output of
parsed documents, rather than by introspection on the metadata
mapper itself.
2021-04-30 16:29:33 +01:00
Alan Woodward b27eaa38dc
Remove 'external values', and replace with swapped out XContentParsers (#72203)
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
2021-04-29 09:17:18 +01:00
David Turner f72fa49749
Fix S3HttpHandler chunked-encoding handling (#72378)
The `S3HttpHandler` reads the contents of the uploaded blob, but if the
upload used chunked encoding then the reader would skip one or more
`\r\n` sequences if they appeared at the start of a chunk.

This commit reworks the reader to be stricter about its interpretation
of chunks, and removes some indirection via streams since we can work
pretty much entirely on the underlying `BytesReference` instead.

Closes #72358
2021-04-28 15:13:48 +01:00
Ryan Ernst d933ecd26c
Convert path.data to String setting instead of List (#72282)
Since multiple data path support has been removed, the Setting no longer
needs to support multiple values. This commit converts the
PATH_DATA_SETTING to a String setting from List<String>.

relates #71205
2021-04-27 08:29:12 -07:00
Alan Woodward e002aa809b
Make FieldNamesFieldMapper responsible for adding its own doc fields (#71929)
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.

This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
2021-04-27 16:03:46 +01:00
David Turner b42d6dbf7b
Add more snapshot details to repo data (#72232)
This commit adds the start and end time of the snapshot, and a list of
the names of any indices which were partially snapshotted, to the
top-level `RepositoryData` blob. These details will be used in a
follow-up that allows for querying snapshots by time and completeness,
and are likely also useful for SLM retention cycles which today must
iterate all the possible `SnapshotInfo` blobs to find this information.
2021-04-27 11:53:38 +01:00
Rene Groeschke 5bcd02cb4d
Restructure build tools java packages (#72030)
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages

This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.

This is a very first step towards that direction.
2021-04-26 14:53:55 +02:00
Christoph Büscher 0519e377ed
Fix case sensitivity rules for wildcard queries on text fields (#71751)
Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).

Closes #71403
2021-04-26 11:50:21 +02:00
David Turner aa5d1948ba
Introduce RepositoryData.SnapshotDetails (#71826)
Today we track a couple of values for each snapshot in the top-level
`RepositoryData` blob: the overall snapshot state and the version of
Elasticsearch which took the snapshot. In the blob these values are
fields of the corresponding snapshot object, but in code they're kept in
independent maps. In the near future we would like to track some more
values in the same way, but adding a new field for every tracked value
is a little ugly. This commit refactors these values into a single
object, `SnapshotDetails`, so that it can be more easily extended in
future.
2021-04-22 14:48:39 +01:00
Armin Braun 5f69ee3fbb
Ensure GCS Repository Metadata Blob Writes are Atomic (#72051)
In the corner case of uploading a large (>5MB) metadata blob we did not set content validation
requirement on the upload request (we automatically have it for smaller requests that are not resumable
uploads). This change sets the relevant request option to enforce a MD5 hash check when writing
`BytesReference` to GCS (as is the case with all but data blob writes)

closes #72018
2021-04-22 11:27:53 +02:00
Alan Woodward 993f0b0d14
Ukrainian language plugin can fill up heap (#71998)
The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary per shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.
2021-04-21 12:13:31 +01:00
Adrien Grand 25750a3696
Make intervals queries fully pluggable through field mappers. (#71429)
`MappedFieldType` only allows configuring `match` and `prefix` queries today.
This change makes it possible to configure how to create `wildcard` and `fuzzy`
queries as well.

This will allow making the upcoming `match_only_text` field fully support
intervals queries.
2021-04-20 18:10:12 +02:00
Francisco Fernández Castaño 68693523b5
Add read barrier to AzureBlobStore#convertStreamToByteBuffer (#71832)
To write the contents from an InputStream we convert it into a
Flux<ByteBuffer>, since we only have 1 IO thread for the Azure
SDK, the reads from the underlying InputStream are dispatched
to different threads in order to avoid blocking the IO thread.
This could cause visibility problems when the underlying InputStream
holds state that is not synchronized. In order to avoid this,
this commit introduces explicit synchronization barriers in order
to ensure visibility. The overhead should be fairly minimal, since
the contention should be small.
2021-04-20 17:43:03 +02:00
Nhat Nguyen a461597c75
Upgrade to Lucene 8.8.2 on 8.0 (#71587) 2021-04-14 08:52:23 -04:00
Lyudmila Fokina 3b0b7941ae
Warn users if security is implicitly disabled (#70114)
* Warn users if security is implicitly disabled

Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
 clusters.
 This change introduces clear warnings when security features are
 implicitly disabled.
 - a warning header in each REST response if security is implicitly
 disabled;
 - a log message during cluster boot.
2021-04-13 18:33:41 +02:00
Rene Groeschke 0f40889879
Update build to Gradle 7.0 (#68506)
- Update gradle wrapper to gradle 7.0
- Remove deprecated usages to make build 7.0 compatible
- Fix excludes in docs snippet tasks (See https://github.com/gradle/gradle/issues/16160 for details)
- Fix deprecation warnings in 7.0
- Add explicit dependencies that have been missed
- Make extract native licenses tasks output dir more explicit
- Use a snapshot of the ospackage plugin that includes a fix for 7.0 already
- fix test runtime classpath setup in repository-hdfs
- Make task dependency explicit to fix further deprecation warnings
- Remove manual check for http repo usages that has been deprecated in gradle 7.0
- Update spock to latest 2.0 milestone required for groovy 3
2021-04-13 09:15:08 +02:00
Jake Landis b1ef1fd800
Introduce yamlRestCompatTests for :plugins projects (#71440) 2021-04-08 16:11:50 -05:00
Armin Braun f8a7576007
S3 Metrics Collection Should not Count Requests without Response (#71406)
It's in the title. Currently we count requests as having been executed even if the
request was broken due to a network issue and not response was ever received.
In order to not overcount requests relative to what the S3 endpoint would measure for e.g. billing
purposes we should ignore these requests.

closes #70060
2021-04-07 18:07:26 +02:00
Yannick Welsch 801c50985c
Use default application credentials for GCS repositories (#71239)
Adds support for "Default Application Credentials" for GCS repositories, making it easier to set up a repository on GCP,
as all relevant information to connect to the repository is retrieved from the environment, not necessitating complicated
keystore setups.
2021-04-06 15:16:00 +02:00
Armin Braun 821383378e
Reduce Memory Use of Parallel Azure Blob Deletes (#71330)
1. Limit the number of blob deletes to execute in parallel to `100`.
2. Use flat listing when deleting a directory to require fewer listings
in between deletes and keep the code simpler.

closes #71267
2021-04-06 14:45:03 +02:00
Jason Tedor 32314493a2
Pass override settings when creating test cluster (#71203)
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
2021-04-02 10:20:36 -04:00
Christoph Büscher ba0ecac934
Add _size and _doc_count to fields output (#70575)
Currently metadata fields like `_size` or `_doc_count` cannot be retrieved using
the fields API. With this change, we allow this if the field is explicitely
queried for using its name, but won't include metadata fields when e.g.
requesting all fields via "*".
With this change, not all metadata fields will be retrievable by using its name,
but support for "_size" and "_doc_count" (which is fetched from source) is
added. Support for other metadata field types will need to be decided case by
case and an appropriate ValueFetcher needs to be supplied.

Relates to #63569
2021-03-31 19:24:21 +02:00
Mark Vieira 6339691fe3
Consolidate REST API specifications and publish under Apache 2.0 license (#70036) 2021-03-26 16:20:14 -07:00
Nik Everett 91c700bd99
Super randomized tests for fetch fields API (#70278)
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
2021-03-24 14:16:27 -04:00
Hendrik Muhs f1b89fad5b
add test framework for json schema validation of rest spec body's (#69902)
Rest API specs define the API's used at the rest level, however these specs
only define the endpoint and the parameters. We miss definitions for the
body, especially when it comes to rich bodies like they are used in ML. 

This change introduces an abstract testcase for json schema validation. This
allows developers to validate any object that is serializable to JSON - using
the `ToXContent` - to be tested against a json schema. You can use it for REST
input and outputs, but also for internal objects(documents) and 
`ToXContentFragments`.

As the overall goal is to ensure it validates properly, the testcase enforces
strictness. A schema file must spec all properties. This will ensure that once
a schema test has been added, it won't go out of sync. Every change to the
pojo enforces a schema update as otherwise the test would fail.

Schemas can load sub-schemas from extra files. That way you can re-use schemas
e.g. in hierarchies or re-use a schema for similar but not same interfaces.
2021-03-17 08:30:40 +01:00
Rory Hunter d181b947c2
Remove depth limit from checkstyle negation rule (#70274)
The Checkstyle rule that bans unary negation in favour of an explicit
`== false` has a `maximumDepth` of 2 configured, which meant that it
didn't catch all violations. The `maximumDepth` isn't required (actually
it has a really high default), so this change removes the limit and
fixes the resulting violations.
2021-03-10 22:06:50 +00:00
Alan Woodward 49897be1bc
Fix position increment gap on phrase/prefix analyzers (#70096)
Custom position increments are handled by wrapping analyzers
with a NamedAnalyzer and passing the custom increment through
to its constructor. However, phrase and prefix analyzers use
delegating analyzer wrappers to add extra filtering to their parent
analyzers, and we can't wrap analyzers multiple times because this
wrecks reuse strategies, so we unwrap the parent before passing
it to phrase and prefix builders. This unwrapping means that we
lose the custom position increments; in particular, it means that
we can end up with a position increment gap of -1, which is the
sentinel value for the unset parameter - and that means exceptions
at index time for backwards-moving positions on fields with multiple
values.

This commit removes the sentinel value and uses standard parameter
defaults and the isConfigured() method instead, plus it adds some
more comprehensive testing for position increments when combined
with phrase/prefix index options on text fields.

Fixes #70049
2021-03-09 12:10:13 +00:00
David Turner 60d53c0206
Stop double-starting transport service in tests (#70056)
Today in tests we often use a utility method that creates and starts a
transport service, and then we start it again in the tests anyway. This
commit removes this unnecessary code and asserts that we only ever call
`TransportService#acceptIncomingRequests` once.
2021-03-08 11:04:43 +00:00
Francisco Fernández Castaño 22ef725a2f
Add integration test for Azure multi block uploads (#69267)
Relates #68957
2021-03-01 15:15:43 +01:00
Armin Braun bb77ab46e0
Stop Ignoring Exceptions on Close in Network Code (#69665)
We should not be ignoring and suppressing exceptions on releasing
network resources quietly in these spots.

Co-authored-by: David Turner <david.turner@elastic.co>
2021-03-01 14:38:18 +01:00
Armin Braun c86d9e3cf6
Simplify BwC for UUIDs in RepositoryData (#69335)
We don't need to separately handle BwC for these two fields since they
were both introduced in `7.12`.
2021-02-22 15:42:09 +01:00
Mark Vieira dabf857548
Remove integration testing using OSS distribution (#69153) 2021-02-17 13:57:04 -08:00
Rene Groeschke 6c957475f0
Split test artifact plugin into base and conventional plugin (#69051)
This is a follow up on #68766 which allows reusing test artifact setup
for other sourceSet than `test`
2021-02-17 12:08:13 +01:00
Marios Trivyzas 1e12c93a31
Fix issue with AnnotatedTextHighlighter and max_analyzed_offset (#69028)
With the newly introduced `max_analyzed_offset` the analyzer of
`AnnotatedTextHighlighter` was wrapped twice with the
`LimitTokenOffsetAnalyzer` by mistake.

Follows: #67325
2021-02-16 17:08:07 +01:00
Marios Trivyzas f9af60bf69
Add query param to limit highlighting to specified length (#67325)
Add a `max_analyzed_offset` query parameter to allow users
to limit the highlighting of text fields to a value less than or equal to the
`index.highlight.max_analyzed_offset`, thus avoiding an exception when
the length of the text field exceeds the limit. The highlighting still takes place,
but stops at the length defined by the new parameter.

Closes: #52155
2021-02-16 09:25:45 +01:00
David Turner b3d5d32209
Adjust encoding of Azure block IDs (#68957)
Today we represent block IDs sent to Azure using the URL-safe base-64
encoding. This makes sense: these IDs appear in URLs. It turns out that
Azure rejects this encoding for block IDs and instead demands that they
are represented using the regular, URL-unsafe, base-64 encoding instead,
then further wrapped in %-encoding to deal with the URL-unsafe
characters that inevitably result.

Relates #66489
2021-02-15 11:34:27 +00:00
Rory Hunter 2d44cce31e
Replace NOT operator with explicit `false` check - part 9 (#68645)
Part 9.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:28:57 +00:00
Luca Cavanna 0ca6819882
DocumentMapper to not implement ToXContent (#68653)
DocumentMapper does not need to implement ToXContent, in fact it is its inner Mapping that needs to and already does. Consumers can switch to calling mapping() and toXContent against it.
2021-02-08 14:17:31 +01:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
David Turner dd519a9eba
Introduce string constant for readonly setting (#68291)
A blob store repository can be put in readonly mode by setting
`readonly: true` in its settings. In the codebase the setting key is
just the literal string `"readonly"` wherever it's used and it takes
some effort to determine what the right setting name is, in particular
to check each time that it's not spelled `"read_only"`.

This commit replaces those literal `"readonly"` strings with the
`BlobStoreRepository#READONLY_SETTING_KEY` constant to reduce this
trappiness.
2021-02-01 15:43:37 +00:00
Ignacio Vera 747773d5af
Upgrade to Lucene 8.8.0 (#68272) 2021-02-01 13:36:03 +01:00
Mark Vieira 413e6bac07
Disable secureHdfs fixture when testing on JDK 16 (#68182) 2021-01-28 18:25:59 -08:00
Armin Braun 77162071f5
Add ClusterUUID to RepositoryData (#68002)
Record the clusterUUID of the last cluster to write
to a repository in the `RepositoryData` and use it for more
meaningful logging when running into a concurrent modification
issue.
2021-01-28 12:38:15 +01:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
David Turner e5a15d4fcb
Introduce repository UUIDs (#67829)
Today a snapshot repository does not have a well-defined identity. It
can be reregistered with a different cluster under a different name, and
can even be registered with multiple clusters in readonly mode.

This presents problems for cases where we need to refer to a specific
snapshot in a globally-unique fashion. Today we rely on the repository
being registered under the same name on every cluster, but this is not a
safe assumption.

This commit adds a UUID that can be used to uniquely identify a
repository. The UUID is stored in the top-level index blob, represented
by `RepositoryData`, and is also usually copied into the
`RepositoryMetadata` that represents the repository in the cluster
state. The repository UUID is exposed in the get-repositories API; other
more meaningful consumers will be added in due course.
2021-01-25 12:17:52 +00:00
Jim Ferenczi e77c523bd9
Upgrade to a new lucene 8.8.0 snapshot (#67691)
This change upgrades to the latest Lucene 8.8.0 snapshot.
It also restores the compression on binary doc values that was lost in the last snapshot upgrade.
The compression is now configurable on binary doc values but we don't expose this functionality yet so this commit ensures that we pick the same compression mode as previous releases (BEST_COMPRESSION).
2021-01-19 13:33:19 +01:00
Armin Braun 6d025d3a27
Log Slowness on Sending Transport Messages (#67664)
Similar to #62444 but for the outbound path.

This does not detect slowness in individual transport handler logic,
this is done via the inbound handler logging already, but instead
warns if it takes a long time to hand off the message to the relevant
transport thread and then transfer the message over the wire.
This gives some visibility into the stability of the network
connection itself and into the reasons for slow network
responses (if they are the result of slow networking on the sender).
2021-01-19 12:19:32 +01:00
Rory Hunter 1a05a5ac24
Introduce deprecation categories (#67443)
Closes #64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
2021-01-18 16:16:54 +00:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
David Turner bc1f50c523
Permit wait_for_active_shards warnings in master (#67498)
Part of the fixes for #66419, this commit permits nodes to emit the
deprecation warning regarding not specifying `?wait_for_active_shards`
when closing an index in 7.x versions for x ≥ 12. This change is
required on `master` too since the BWC tests encounter these warnings.

Relates #67246, which is the 7.x part of this change.
2021-01-14 15:55:43 +00:00
Nik Everett 7b0b09dfd7
Help eclipse compilation (#67403)
Eclipse wasn't seeing the special shadow jars we were making for
repository-azure and repository-hdfs so it wasn't able to compile those
plugins. This points Eclipse at the project that we use to build the
shadow jar which gets it compiling. The tests don't pass because we
aren't pointing at the shadow jars but at least we compile.
2021-01-13 13:37:46 -05:00
Francisco Fernández Castaño 4b9f2e94bd
Increase Azure client timeout on tests (#67210)
Additionally, this commit improves the error messages provided as
previously we weren't including the blob name on
deletion failures.

Closes #67119
2021-01-13 13:57:13 +01:00
Francisco Fernández Castaño 78ad79a87f
Remove assertion that checks the exception message on AzureBlobContainerRetriesTests#testRetryUntilFail (#67258)
The error message might change depending on the timing when we try
to read from the stream. Since we already check that we're not able
to read any data this assertion doesn't add much value.
2021-01-12 14:05:28 +01:00
Ignacio Vera 604ee06a3b
Upgrade to lucene-8.8-snapshot-f73f6b1 (#67228) 2021-01-12 08:03:00 +01:00
Dan Hermann 8b05edaeb5
Fix attachment processor test that fails on Windows (#67156) 2021-01-08 07:10:04 -06:00
Mark Vieira 22a6811802 Mute AzureBlobStoreRepositoryTests.testLargeBlobCountDeletion 2021-01-07 11:54:05 -08:00
Francisco Fernández Castaño ac63c6dcf5
Fix AzureBlobContainerRetriesTests#testRetryUntilFail (#67077)
We were too agressive with retries and in certain scenarios (CI) it
was possible that when the SDK had retried n times the http handler
had some pending backlog that didn't account for all the performed
requests.

Closes #66865
2021-01-06 13:47:21 +01:00
Francisco Fernández Castaño 9950cd24be
Add Ability to Write a BytesReference to Azure BlobContainer (#66683) 2021-01-06 12:01:11 +01:00
Francisco Fernández Castaño f1ebe1195c
Avoid early task cancellation during azure parallel blob deletions (#66929)
Closes #66633
2021-01-05 11:08:16 +01:00
Albert Zaharovits a184486362
Fix azure repo stream exhaust check for multipart uploads (#66769)
This PR fixes the validation of the conversion from an input stream to a flux in the
AzureBlobStore's multipart update logic, which erroneously checked that the upload
input stream is exhausted after each part's flux is completed.
2021-01-04 17:49:40 +02:00
markharwood aa01af882e
Annotated text plugin highlighter causes "array_index_out_of_bounds_exception" (#66593)
Recent changes to the way Analyzers and field mappings are managed revealed a bug in the AnnotatedHighlighterAnalyzer class.
Old sequences of calls avoided the issue but under the new scheme a counter reset was required between documents being highlighted.
Closes #66535
2021-01-04 15:41:49 +00:00
Rene Groeschke eee6e11883
Port all task definitions to task avoidance api (#66738)
This finishes porting all tasks created in gradle build scripts and plugins to use 
the task avoidance api (see #56610)

* Port all task definitions to task avoidance api
* Fix last task created during configuration
* Fix test setup in  :modules:reindex
* Declare proper task inputs
2021-01-04 12:32:19 +01:00
Armin Braun f0459f63f2
Fix S3ClientSettings Class Loading (#66886)
This is motivated by the inability to run
`org.elasticsearch.repositories.encrypted.EncryptedS3BlobStoreRepositoryIntegTests`
in isolation without this workaround. The way integration tests load classes
otherwise leads to a load order which doesn't load the plugin class first,
thus fails to apply the jackson workaround before further S3 classes are loaded
but depend on our Jackson workaround.
2021-01-04 12:30:34 +01:00
Mark Tozzi e26c9bbd52
Rename BYTES ValuesSourceType to reflect intended usage (#66762) 2020-12-30 12:39:17 -05:00
Albert Zaharovits cd72f45c33
Client-side encrypted snapshot repository (feature flag) (#66773)
The client-side encrypted repository is a new type of snapshot repository that
internally delegates to the regular variants of snapshot repositories (of types
Azure, S3, GCS, FS, and maybe others but not yet tested). After the encrypted
repository is set up, it is transparent to the snapshot and restore APIs (i.e. all
snapshots stored in the encrypted repository are encrypted, no other parameters
required).
The encrypted repository is protected by a password stored on every node's
keystore (which must be the same across the nodes).
The password is used to generate a key encrytion key (KEK), using the PBKDF2
function, which is used to encrypt (using the AES Wrap algorithm) other
symmetric keys (referred to as DEK - data encryption keys), which themselves
are generated randomly, and which are ultimately used to encrypt the snapshot
blobs.

For example, here is how to set up an encrypted  FS repository:
------
 1) make sure that the cluster runs under at least a "platinum" license
(simplest test configuration is to put `xpack.license.self_generated.type: "trial"`
in the elasticsearch.yml file)
 2) identical to the un-encrypted FS repository, specify the mount point of the
shared FS in the elasticsearch.yml conf file (on all the cluster nodes),
e.g. `path.repo: ["/tmp/repo"]`
 3) store the repository password inside the elasticsearch.keystore, *on every cluster node*.
In order to support changing password on existing repository (implemented in a follow-up),
the password itself must be names, e.g. for the "test_enc_key" repository password name:
`./bin/elasticsearch-keystore add repository.encrypted.test_enc_pass.password`
*type in the password*
4) start up the cluster and create the new encrypted FS repository, named "test_enc", by calling:
`
curl -X PUT "localhost:9200/_snapshot/test_enc?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "encrypted",
  "settings": {
    "location": "/tmp/repo/enc",
    "delegate_type": "fs",
    "password_name": "test_enc_pass"
  }
}
'
`
5) the snapshot and restore APIs work unmodified when they refer to this new repository, e.g.
` curl -X PUT "localhost:9200/_snapshot/test_enc/snapshot_1?wait_for_completion=true"`


Related: #49896 #41910 #50846 #48221 #65768
2020-12-23 23:46:59 +02:00
Ioannis Kakavas bd873698bc
Ensure CI is run in FIPS 140 approved only mode (#64024)
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.

This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.

It also addresses a number of tests that would fail in approved only mode
Mainly:

    Tests that use PBKDF2 with a password less than 112 bits (14char). We
    elected to change the passwords used everywhere to be at least 14
    characters long instead of mandating
    the use of pbkdf2_stretch because both pbkdf2 and
    pbkdf2_stretch are supported and allowed in fips mode and it makes sense
    to test with both. We could possibly figure out the password algorithm used
    for each test and adjust password length accordingly only for pbkdf2 but
    there is little value in that. It's good practice to use strong passwords so if
    our docs and tests use longer passwords, then it's for the best. The approach
    is brittle as there is no guarantee that the next test that will be added won't
    use a short password, so we add some testing documentation too.
    This leaves us with a possible coverage gap since we do support passwords
    as short as 6 characters but we only test with > 14 chars but the
    validation itself was not tested even before. Tests can be added in a followup,
    outside of fips related context.

    Tests that use a PKCS12 keystore and were not already muted.

    Tests that depend on running test clusters with a basic license or
    using the OSS distribution as FIPS 140 support is not available in
    neither of these.

Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
2020-12-23 21:00:49 +02:00
Gordon Brown 045abd82d4
Mute AzureStorageCleanupThirdPartyTests.testCleanup (#66635)
See https://github.com/elastic/elasticsearch/issues/66633
2020-12-18 12:52:15 -07:00
Armin Braun 3819fcb582
Add Ability to Write a BytesReference to BlobContainer (#66501)
Except when writing actual segment files to the blob store
we always write `BytesReference` instead of a stream.
Only having the stream API available forces needless copies
on us. I fixed the straight-forward needless copying for
HDFS and FS repos in this PR, we could do similar fixes for
GCS and Azure as well and thus significantly reduce the peak
memory use of these writes on master nodes in particular.
2020-12-17 17:42:29 +01:00
Francisco Fernández Castaño c96b3ba9b6
Fix AzureBlobContainerRetriesTests#testRetryUntilFails (#66531)
Add a clearer approach to this test
2020-12-17 16:53:35 +01:00
Francisco Fernández Castaño 02ac68eb8b
Reduce memory usage on Azure repository implementation (#66489)
This commit moves the upload logic to the repository itself
instead of delegating into the SDK.
Multi-block uploads are done sequentially instead of in parallel
that allows to bound the outstanding memory.
Additionally the number of i/o threads and heap arenas have been 
reduced to 1, to reduce the memory overhead.

Closes #66385
2020-12-17 11:09:55 +01:00
Francisco Fernández Castaño 54fbd03052
Mute AzureBlobStoreRepositoryTests (#66389) 2020-12-15 19:06:28 +01:00
Francisco Fernández Castaño fd1d282ba9
Upgrade Azure repository SDK to v12 (#65140)
Upgrade Azure repository to the latest non blocking Azure SDK.

Closes https://github.com/elastic/elasticsearch/issues/43309

Co-authored-by: Ryan Ernst <ryan@iernst.net>
2020-12-15 11:39:06 +01:00
James Baiera 9bb6a3ad2d
Add HDFS searchable snapshot integration (#66185)
Adds a bounded read implementation on the HDFS blob store as well as integration tests to 
the searchable snapshot project that ensures functionality on both kerberos and simple 
authentication HDFS.
2020-12-14 16:04:41 -05:00
Martijn Laarman e31e3dea32
Add `visibility` the to rest-spec-api (#56104) 2020-12-14 12:23:28 +01:00
Rene Groeschke defaa93902
Avoid tasks materialized during configuration phase (#65922)
* Avoid tasks materialized during configuration phase
* Fix RestTestFromSnippet testRoot setup
2020-12-12 16:14:17 +01:00
Dan Hermann 51452d1ae3
Mute failing AttachmentProcessor.testIndexedCharsWithResourceName test (#66121) 2020-12-09 11:24:57 -06:00
Martijn Laarman 8d3def3e1f
Add Accept & Content-Type headers to rest api spec (#53979)
Co-authored-by: Russ Cam <russ.cam@elastic.co>
2020-12-09 14:43:05 +01:00
Dan Hermann 149f1f9412
Minor DRYing up of attachment processor tests (#65975) 2020-12-08 11:38:00 -06:00
Rene Groeschke 0911d04467
Make AntFixture handling task provider api compliant (#65832)
This tweaks the AntFixture handling to make it compliant with the task avoidance api.
Tasks of type StandaloneRestTestTask are now generally finalised by using the typed ant stop task
which allows us to remove of errorprone dependsOn overrides in StandaloneRestTestTask. As a result
we also ported more task definitions in the build to task avoidance api.

Next work item regarding AntFixture handling is porting AntFixture to a plain Gradle task and remove
Groovy AntBuilder will allow us to port more build logic from Groovy to Java but is out of the scope of
This PR.
2020-12-08 13:07:36 +01:00
yangyaofei 0f8476361c
Attachment ingest processor: add resource_name field (#64389) 2020-12-07 11:46:20 -06:00
Alexander Reelsen fd3d7e3368
Remove class that is part of commons-codec (#65259)
This class was copied from the trunk and is now part of the stable 
release since 2012.
2020-12-03 17:55:18 +01:00
Armin Braun 4547d3b245
Refactor ActionListener#map towards Stricter API (#65526)
Making `#map` look and feel a little nicer, optimize chains of `#map`,
and replace `#delegateFailure` calls with `#map` calls where possible
in order to enforce callbacks not throwing where possible.
2020-12-01 03:00:51 +01:00
Armin Braun 06a31a0aca
Add List Append Utility Method (#65576)
(list -> copy -> add one -> wrap immutable) is a pretty common pattern in CS
updates and tests => added a shortcut for it here and used it in easily identifyable
spots.
2020-12-01 02:47:21 +01:00
Alan Woodward 1a8ce8716d
Restore use of default search and search_quote analyzers (#65491)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 16:57:45 +00:00
Rene Groeschke 97749a3372
Port rest integ tests to use task avoidance api (#65011)
This ports the majority of the rest integ tests tasks to use the task avoidance api.

- There are some edge cases left that we need to investigate, but we can do that separately.
2020-11-26 10:30:06 +01:00
Alan Woodward d088171a87
Use ValueFetcher when loading text snippets to highlight (#63572)
HighlighterUtils.loadFieldValues() loads values directly from the source, and
then callers have to deal with filtering out values that would have been removed
by an ignore_above filter on keyword fields. Instead, we can use the
ValueFetcher for the relevant field, which handles all this logic for us.

Closes #59931.
2020-11-24 16:09:37 +00:00
Julie Tibshirani f4a462d05e
Simplify how source is passed to fetch subphases. (#65292)
This PR simplifies how the document source is passed to each fetch subphase. A summary of the strategy:
* For each document, we try to eagerly load the source and store it on `HitContext`. Most subphases that access source, like source filtering and highlighting, use `HitContext`. For nested hits, we filter the parent source and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on `QueryShardContext#lookup`. This allows subphases that access source through `SearchLookup` to use the pre-loaded source when possible. This is now a common occurrence, since runtime fields are supported in the 'fields' option and may soon be supported in highlighting.

There is no longer a special `SearchLookup` just for the fetch phase. This was not necessary and was mostly caused by a misunderstanding of how `QueryShardContext` should be used.

Addresses #62511.
2020-11-20 14:09:41 -08:00
Ryan Ernst 23a47cebf1
Add plugin permission validation (#64751)
Security manager policies within plugins currently can ask to grant any
permission (though we block some within the security manager itself at
runtime). Yet most of these permissions should never be necessary, and
some we would actively not want any plugins to be allowed to use. This
commit adds validation of plugins' policy files to restrict the
permissions allowed to be granted to a subset that is reasonable for
plugins to need. The allowed permissions are not ideal (still containing
things like suppressAccessChecks), but it is a step forward in defining
a stricter model for plugins that reduces the surface area of potential
abuse.
2020-11-19 14:21:34 -08:00
Ryan Ernst 06b2deb674
Move security manager codebases files to plugin-metadata (#65243)
The codebases files are hints that allow the test framework to map
codebase names required by plugin security policy files to urls that may
exist as classes directories on disk. These files end up in the same
test resources directory in gradle, but in eclipse they exist in
different directories. This commit reorganizes the files so they exist
within the same plugin-metadata source, thereby existing in the same
output directory used by tests. Finally, the codebases files are
filtered out of the final jar within the plugin build.
2020-11-19 09:18:41 -08:00
Rory Hunter fd675fd836
Introduce licensed plugins (#64850)
This PR introduces the concept of "licensed" plugins. Such plugins
may only be installed on installations of the default distribution,
and this is enforced by the plugin installer. This PR also moves
the `quote-aware-fs` plugin to the `x-pack` directory, and marks
it as licensed.

Note that I didn't move the plugin source under `x-pack/plugin`
because all the existing x-pack plugins are actually bundles as
modules into the default distribution, whereas the `quota-aware-fs`
plugin needs to remain a standalone plugin.
2020-11-17 16:21:57 +00:00
Mark Vieira c0ba2ec875
Remove shutdown hook permission from hdfs plugin (#65016) 2020-11-12 13:13:12 -08:00
Mark Vieira fef57fb367 Revert "Remove shutdown hook permission from hdfs plugin (#64899)"
This reverts commit 974766d7
2020-11-12 08:24:34 -08:00
Mark Vieira e65c044b56 Revert "Remove commented out dependency"
This reverts commit 664c293d
2020-11-12 08:24:29 -08:00
Rene Groeschke 810e7ff6b0
Move tasks in build scripts to task avoidance api (#64046)
- Some trivial cleanup on build scripts
- Change task referencing in build scripts to use task avoidance api
where replacement is trivial.
2020-11-12 12:04:15 +01:00