Commit Graph

5915 Commits

Author SHA1 Message Date
Nik Everett e686e18819
Simpler regex constants in painless (#68486)
Replaces the double `Pattern.compile` invocations in painless scripts
with the fancy constant injection we added in #68088. This caused one of
the tests to fail. It turns out that we weren't fully iterating the IR
tree during the constant folding phases. I started experimenting and
added a ton of tests that failed. Then I fixed them by changing the IR
tree walking code.
2021-02-03 16:51:01 -05:00
nagads 2af3e8e7e2
[Painless] Augmentation.join can't handle empty strings at the start (#68251)
Fixes #33434
2021-02-03 12:30:54 -05:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Stuart Tettemer 97eda6fab2
Scripting: refactor use of stdlib extractor (#68402)
If there's no java stdlib path, `StdlibJavadocExtractor` is unnecessary.

This creates a separate code path for that case, which removes a
bunch of checking that `StdlibJavadocExtractor` is `null`.
2021-02-02 12:28:11 -06:00
Rory Hunter b4514228f0
Replace NOT operator with explicit `false` check - part 5 (#68360)
Part 5.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-02 14:27:33 +00:00
Stuart Tettemer 460b33f9b2
Scripting: readable array types for context api (#68237)
Instead of fixing up array names post-hoc when generating the api spec
and painless context docs, fix them up in the `_scripts/painless/_context`
API call.
2021-02-01 15:10:23 -06:00
Nik Everett 419ce10989
Add grok and dissect methods to runtime fields (#68088)
This adds a `grok` and a `dissect` method to runtime fields which
returns a `Matcher` style object you can use to get the matched
patterns. A fairly simple script to extract the "verb" from an apache
log line with `grok` would look like this:
```
String verb = grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.verb;
if (verb != null) {
  emit(verb);
}
```

And `dissect` would look like:
```
String verb = dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{status} %{size}').extract(doc["message"].value)?.verb;
if (verb != null) {
  emit(verb);
}
```

We'll work later to get it down to a clean looking one liner, but for
now, this'll do.

The `grok` and `dissect` methods are special in that they only run at
script compile time. You can't pass non-constants to them. They'll
produce compile errors if you send in a bad pattern. This is nice
because they can be expensive to "compile" and there are many other
optimizations we can make when the patterns are available up front.

Closes #67825
2021-02-01 14:16:01 -05:00
Ignacio Vera 747773d5af
Upgrade to Lucene 8.8.0 (#68272) 2021-02-01 13:36:03 +01:00
Nik Everett b52ea0b02b
Fix painless build in eclipse (#68166)
Painless now has a `doc` source which has its own dependencies. That's
lovely for everything but Eclipse doesn't understand that sort of thing.
This adds the docs dependencies to the regular build path when building
with Eclipse.
2021-01-29 09:40:24 -05:00
Alan Woodward d981cf2dff
Remove intermediate SearchLookup classes (#68052)
SearchLookup has two intermediate classes, DocMap and StoredFieldsLookup, that
are simple factories for their Leaf implementations. They are never accessed
outside SearchLookup, with the exception of two calls on DocMap that can be
easily refactored. This commit removes them, making SearchLookup.getLeafSearchLookup
directly responsible for creating the leaf lookups.
2021-01-29 10:44:05 +00:00
Stuart Tettemer 7c98d9a052
Scripting: Parse stdlib files for parameter names (#67837)
* Scripting: Parse stdlib files for parameter names

* Task `generateContextApiSpec` takes optional system parameter
  `jdksrc` with path to extracted java standard library source
  files.
* stdlib source files are the source of `parameter_names` list
  and the `javadoc` value.
* javadoc values may contain newlines and markup such as
  `{@code XX}`, `<p>`, `@throws`

Example method:
```
{
  "declaring": "Appendable",
  "name": "append",
  "return": "Appendable",
  "javadoc": "Appends a subsequence of the specified character sequence to this...",
  "parameters": ["CharSequence", "int", "int" ],
  "parameter_names": ["csq", "start", "end"]
}
```
2021-01-28 13:38:47 -06:00
Gordon Brown 4fe7a612fc
Allow population of Enrich indices to work with System Index protections (#67406)
This PR does three things:
1) Tweaks existing reindex infrastructure so that different clients can be used for the "search" part and the "index" part of a reindex operation, and
2) Modifies Enrich to take advantage of this to perform the "search" part in the security context of the current user (so that DLS/FLS etc. are properly applied) while performing the "index" part in the security context of the Enrich plugin (so that access to system indices, and `.enrich-*` in particular, is allowed regardless of the permissions of the current user).
3) Adds integration tests for the above, to verify that Enrich does not leak info protected by DLS and/or FLS.

Co-authored-by: Jay Modi <jay.modi@elastic.co>
2021-01-28 10:17:26 -07:00
Alan Woodward f6aed63442
Simplify getting persistent map from SourceLookup (#67331)
SourceLookup has two different ways of getting its internal source as an object
that will persist once the lookup has moved to a separate object. The first,
source(), can return null, and so only works after the source map has been
set explicitly, or one of the other map functions has been called. The second,
loadSourceIfNeeded() will never return null, and can be used to lazily load
values from stored fields.

In the past, the distinction between these two methods was important because
you could use a null check on source() to see if the source field was enabled.
We can now do this by checking the isSourceEnabled() method on
SearchExecutionContext.

This commit merges both fields into a single source() method, with the semantics
of the old loadSourceIfNeeded() method.
2021-01-27 11:08:22 +00:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Dan Hermann b330493a4b
Rename mime_type configuration option to media_type (#67860) 2021-01-25 11:29:12 -06:00
Przemyslaw Gomulka c2c50d5aed
Make scripted search templates work with new mediaType from XContentType.JSON (#67677)
Stored scripts can have content_type option set, however when empty they default to XContentType.JSON#mediaType(). Commit 5e74f79 has changed this in master (ES8) method to return application/json;charset=utf-8 (previously application/json; charset=UTF-8)
This means that when upgrading ES from version 7 to 8 stored script will fail when being used as the encoder is being matched with string equality (map key)

This commit address this by adding back (in addition) the old application/json; charset=UTF-8 into the encoders map.

closes #66986
2021-01-21 12:03:38 +01:00
Jim Ferenczi e77c523bd9
Upgrade to a new lucene 8.8.0 snapshot (#67691)
This change upgrades to the latest Lucene 8.8.0 snapshot.
It also restores the compression on binary doc values that was lost in the last snapshot upgrade.
The compression is now configurable on binary doc values but we don't expose this functionality yet so this commit ensures that we pick the same compression mode as previous releases (BEST_COMPRESSION).
2021-01-19 13:33:19 +01:00
Armin Braun 6d025d3a27
Log Slowness on Sending Transport Messages (#67664)
Similar to #62444 but for the outbound path.

This does not detect slowness in individual transport handler logic,
this is done via the inbound handler logging already, but instead
warns if it takes a long time to hand off the message to the relevant
transport thread and then transfer the message over the wire.
This gives some visibility into the stability of the network
connection itself and into the reasons for slow network
responses (if they are the result of slow networking on the sender).
2021-01-19 12:19:32 +01:00
Mayya Sharipova 76482210b8
Add linear function to rank_feature query (#67438)
This adds a linear function to the set of functions available
for rank_feature query

Closes #49859
2021-01-18 11:44:13 -05:00
Rory Hunter 1a05a5ac24
Introduce deprecation categories (#67443)
Closes #64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
2021-01-18 16:16:54 +00:00
Rene Groeschke ca96612245
Remove debugging printlns from build scripts 2021-01-18 15:19:19 +01:00
Rene Groeschke f83d545b81
Port UrlFixture to test fixture plugin (#67169)
- Port UrlFixture to test fixture plugin
- Avoid exposing PID and PORt for http fixture when not required
- Make AbstractHttpFixture work inside and outside docker
- Check directories when running UrlFixture
2021-01-18 14:59:18 +01:00
gf2121 92f85981a7
Avoid duplicate serialization for TermsQueryBuilder (#67223)
Avoid duplicate serialization for TermsQuery.
2021-01-18 09:04:29 +01:00
Nik Everett 217c9e0c04
Fix painless tests in eclipse (#67602)
`Augmentation.java` had a zero width space [1] in two method
definitions:
```
    public static String[] split(Pattern receiver, int limitFactor, CharSequence input, int limit) {
                                ^------- Right before the ( character
    public static Stream<String> splitAsStream(Pattern receiver, int limitFactor, CharSequence input) {
                                              ^ Right before the ( here too
```

Sadly, Eclipse and javac treat this character differently. Eclipse seems
to include it in the method name and javac seems to treat it as regular
space. This caused all the unit tests for painless to fail to load
because they couldn't find the `split` and `splitAsStream`
augmentations. But if you listed all of the methods they looked like
they were there. If you crack open the line in a hex editor you can see
it.

Eclipse is tracking [2] similar issues.

[1]: https://en.wikipedia.org/wiki/Zero-width_space
[2]: https://bugs.eclipse.org/bugs/show_bug.cgi?id=547601
2021-01-15 14:42:18 -05:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Luca Cavanna df7041f45a
Remove last DocumentMapper reference from MappingLookup (#67157)
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)

We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.

As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
2021-01-12 11:48:51 +01:00
Ignacio Vera 604ee06a3b
Upgrade to lucene-8.8-snapshot-f73f6b1 (#67228) 2021-01-12 08:03:00 +01:00
Dan Hermann eddab39e2f
Configurable MIME type for mustache template encoding on set processor (#65314) 2021-01-07 07:40:57 -06:00
Tim Vernum 248b6a89e8
Update template warning for FIPS in Netty test (#67067)
This changes the expected error message (on FIPS) so that the
order of the templates (and their associated patterns) matches
the (newly updated) order generated by the server.

Relates: #67066
Resolves: #66820
2021-01-07 12:01:23 +11:00
Stuart Tettemer 8a001d1a40
Scripting: whitelist Json functions for ingest (#67118) 2021-01-06 13:08:01 -06:00
Stuart Tettemer 93bc36ef6f
Scripting: Add OSS whitelist to execute API (#67038)
* Scripting: Add OSS whitelist to execute API

* Ingest
* Score
* MovFn
* Json

Fixes: #67035
2021-01-05 15:27:27 -06:00
Przemko Robakowski e1c6cbced7
Fix whitespace as a separator in CSV processor (#67045)
This change fixes problem when using space or tab as a separator in CSV processor - we check if current character is separator before we check if it is whitespace.

This also improves tests to always check all combinations of separators and quotes.

Closes #67013
2021-01-05 22:19:31 +01:00
Jack Conradson b0eb81301a
Fix static inner class resolution in Painless (#67027)
When removing the "lexer hack" to remove type context from the lexer, static inner class resolution 
wasn't properly accounted for. This change adds code to handle static inner class resolution.
2021-01-05 11:05:24 -08:00
David Turner b3e550c289
Make InternalClusterInfoService async (#66993)
This commit reworks the InternalClusterInfoService to run
asynchronously, using timeouts on the stats requests instead of
implementing its own blocking timeouts. It also improves the logging of
failures by identifying the nodes that failed or timed out. Finally it
ensures that only a single refresh is running at once, enqueueing later
refresh requests to run immediately after the current refresh is
finished rather than racing them against each other.
2021-01-05 17:58:30 +00:00
Przemyslaw Gomulka 5e74f79e22
Support response content-type with versioned media type (#65500)
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.

#51816
2021-01-05 09:23:22 +01:00
Jack Conradson fbedb66075
Remove leniency for casting from def to void in Painless (#66957)
This leniency was originally for lambda and method reference conversions, but they are both special 
cased now. This removes change removes the unnecessary leniency of a cast from a def type to a void 
type. This also fixes (#66175).
2021-01-04 15:00:02 -08:00
Luca Cavanna dbefc05e6e
Don't require DocumentMapper as an argument when parsing a document (#66780)
Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line.

Relates to #66295
2021-01-04 15:34:44 +01:00
Rene Groeschke eee6e11883
Port all task definitions to task avoidance api (#66738)
This finishes porting all tasks created in gradle build scripts and plugins to use 
the task avoidance api (see #56610)

* Port all task definitions to task avoidance api
* Fix last task created during configuration
* Fix test setup in  :modules:reindex
* Declare proper task inputs
2021-01-04 12:32:19 +01:00
Mark Tozzi e26c9bbd52
Rename BYTES ValuesSourceType to reflect intended usage (#66762) 2020-12-30 12:39:17 -05:00
Mayya Sharipova 5b6675ab0d
Mute testTemplateExists (#66863)
Mute Netty4HeadBodyIsEmptyIT.testTemplateExists, as it fails in FIPS
mode.

Relates to #66820
2020-12-29 10:45:16 -05:00
Tim Vernum 22bc833d85
Skip netty4 yaml test in FIPS mode (#66842)
The "Netty loaded" YAML test asserts that the configured transport is
"netty4", however when in FIPS mode, the tests enable security and the
configured transport is "security4".

This change skips the netty4 yaml test when running in FIPS mode.

Resolves: #66818
2020-12-29 18:28:23 +11:00
Przemyslaw Gomulka 8f74f18257
Fix ingest java week based year defaulting (#65717)
If year, year of era, or weekbased year is not specified ingest Java
date processor is defaulting year to current year.
However the current implementation has mistaken weekBasedYear field with
weekOfWeekBasedYear. This has lead to incorrect defaulting.

relates #63458
2020-12-28 10:49:31 +01:00
Ioannis Kakavas bd873698bc
Ensure CI is run in FIPS 140 approved only mode (#64024)
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.

This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.

It also addresses a number of tests that would fail in approved only mode
Mainly:

    Tests that use PBKDF2 with a password less than 112 bits (14char). We
    elected to change the passwords used everywhere to be at least 14
    characters long instead of mandating
    the use of pbkdf2_stretch because both pbkdf2 and
    pbkdf2_stretch are supported and allowed in fips mode and it makes sense
    to test with both. We could possibly figure out the password algorithm used
    for each test and adjust password length accordingly only for pbkdf2 but
    there is little value in that. It's good practice to use strong passwords so if
    our docs and tests use longer passwords, then it's for the best. The approach
    is brittle as there is no guarantee that the next test that will be added won't
    use a short password, so we add some testing documentation too.
    This leaves us with a possible coverage gap since we do support passwords
    as short as 6 characters but we only test with > 14 chars but the
    validation itself was not tested even before. Tests can be added in a followup,
    outside of fips related context.

    Tests that use a PKCS12 keystore and were not already muted.

    Tests that depend on running test clusters with a basic license or
    using the OSS distribution as FIPS 140 support is not available in
    neither of these.

Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
2020-12-23 21:00:49 +02:00
Jim Ferenczi c756ce1acf
Sort field tiebreaker for PIT (point in time) readers (#66093)
This commit introduces a new sort field called `_shard_doc` that
can be used in conjunction with a PIT to consistently tiebreak
identical sort values. The sort value is a numeric long that is
composed of the ordinal of the shard (assigned by the coordinating node)
and the internal Lucene document ID. These two values are consistent within
a PIT so this sort criteria can be used as the tiebreaker of any search
requests.
Since this sort criteria is stable we'd like to add it automatically to any
sorted search requests that use a PIT but we also need to expose it explicitly
in order to be able to:
* Reverse the order of the tiebreaking, useful to search "before" `search_after`.
* Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8.

I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big.

Relates #56828
2020-12-18 12:13:12 +01:00
Armin Braun 3819fcb582
Add Ability to Write a BytesReference to BlobContainer (#66501)
Except when writing actual segment files to the blob store
we always write `BytesReference` instead of a stream.
Only having the stream API available forces needless copies
on us. I fixed the straight-forward needless copying for
HDFS and FS repos in this PR, we could do similar fixes for
GCS and Azure as well and thus significantly reduce the peak
memory use of these writes on master nodes in particular.
2020-12-17 17:42:29 +01:00
Julie Tibshirani d0683141f4
Ensure all query builder tests consider older versions. (#66401)
This PR removes outdated overrides in some tests that prevent them from testing
older index versions. Also removes an old comment + logic from
AggregatorFactoriesTests.
2020-12-16 09:19:26 -08:00
Nik Everett 7b3c6f2a0c
Further clean up in AggregatorTestCase (#66395)
Drops `AggregatorTestCase#mapperServiceMock` because it is getting in
the way of other work I'm doing for runtime fields. It was only
overridden to test the `parent` and `child` aggregation to add the
`MappedFieldType`s for join fields in the backdoor. Those aggregations
can just as easily add those fields in the normal method calls.
2020-12-16 11:56:04 -05:00
Jim Ferenczi 6d1f43c6d2
Fix search_as_you_type field with term_vector (#66432)
This commit fixes a bug in the search_as_you_type field that was introduced during
the refactoring of the field mapper. The prefix field that is used internally
by the search_as_you_type mapper doesn't need term vector even if they are activated
on the main field. So this commit ensures that we don't copy the options from the main
field when we create the prefix sub-field.

Closes #66407
2020-12-16 17:04:51 +01:00
Martijn Laarman e31e3dea32
Add `visibility` the to rest-spec-api (#56104) 2020-12-14 12:23:28 +01:00
Rene Groeschke defaa93902
Avoid tasks materialized during configuration phase (#65922)
* Avoid tasks materialized during configuration phase
* Fix RestTestFromSnippet testRoot setup
2020-12-12 16:14:17 +01:00