Replaces the double `Pattern.compile` invocations in painless scripts
with the fancy constant injection we added in #68088. This caused one of
the tests to fail. It turns out that we weren't fully iterating the IR
tree during the constant folding phases. I started experimenting and
added a ton of tests that failed. Then I fixed them by changing the IR
tree walking code.
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
If there's no java stdlib path, `StdlibJavadocExtractor` is unnecessary.
This creates a separate code path for that case, which removes a
bunch of checking that `StdlibJavadocExtractor` is `null`.
Part 5.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
Instead of fixing up array names post-hoc when generating the api spec
and painless context docs, fix them up in the `_scripts/painless/_context`
API call.
This adds a `grok` and a `dissect` method to runtime fields which
returns a `Matcher` style object you can use to get the matched
patterns. A fairly simple script to extract the "verb" from an apache
log line with `grok` would look like this:
```
String verb = grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.verb;
if (verb != null) {
emit(verb);
}
```
And `dissect` would look like:
```
String verb = dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{status} %{size}').extract(doc["message"].value)?.verb;
if (verb != null) {
emit(verb);
}
```
We'll work later to get it down to a clean looking one liner, but for
now, this'll do.
The `grok` and `dissect` methods are special in that they only run at
script compile time. You can't pass non-constants to them. They'll
produce compile errors if you send in a bad pattern. This is nice
because they can be expensive to "compile" and there are many other
optimizations we can make when the patterns are available up front.
Closes#67825
Painless now has a `doc` source which has its own dependencies. That's
lovely for everything but Eclipse doesn't understand that sort of thing.
This adds the docs dependencies to the regular build path when building
with Eclipse.
SearchLookup has two intermediate classes, DocMap and StoredFieldsLookup, that
are simple factories for their Leaf implementations. They are never accessed
outside SearchLookup, with the exception of two calls on DocMap that can be
easily refactored. This commit removes them, making SearchLookup.getLeafSearchLookup
directly responsible for creating the leaf lookups.
* Scripting: Parse stdlib files for parameter names
* Task `generateContextApiSpec` takes optional system parameter
`jdksrc` with path to extracted java standard library source
files.
* stdlib source files are the source of `parameter_names` list
and the `javadoc` value.
* javadoc values may contain newlines and markup such as
`{@code XX}`, `<p>`, `@throws`
Example method:
```
{
"declaring": "Appendable",
"name": "append",
"return": "Appendable",
"javadoc": "Appends a subsequence of the specified character sequence to this...",
"parameters": ["CharSequence", "int", "int" ],
"parameter_names": ["csq", "start", "end"]
}
```
This PR does three things:
1) Tweaks existing reindex infrastructure so that different clients can be used for the "search" part and the "index" part of a reindex operation, and
2) Modifies Enrich to take advantage of this to perform the "search" part in the security context of the current user (so that DLS/FLS etc. are properly applied) while performing the "index" part in the security context of the Enrich plugin (so that access to system indices, and `.enrich-*` in particular, is allowed regardless of the permissions of the current user).
3) Adds integration tests for the above, to verify that Enrich does not leak info protected by DLS and/or FLS.
Co-authored-by: Jay Modi <jay.modi@elastic.co>
SourceLookup has two different ways of getting its internal source as an object
that will persist once the lookup has moved to a separate object. The first,
source(), can return null, and so only works after the source map has been
set explicitly, or one of the other map functions has been called. The second,
loadSourceIfNeeded() will never return null, and can be used to lazily load
values from stored fields.
In the past, the distinction between these two methods was important because
you could use a null check on source() to see if the source field was enabled.
We can now do this by checking the isSourceEnabled() method on
SearchExecutionContext.
This commit merges both fields into a single source() method, with the semantics
of the old loadSourceIfNeeded() method.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
Stored scripts can have content_type option set, however when empty they default to XContentType.JSON#mediaType(). Commit 5e74f79 has changed this in master (ES8) method to return application/json;charset=utf-8 (previously application/json; charset=UTF-8)
This means that when upgrading ES from version 7 to 8 stored script will fail when being used as the encoder is being matched with string equality (map key)
This commit address this by adding back (in addition) the old application/json; charset=UTF-8 into the encoders map.
closes#66986
This change upgrades to the latest Lucene 8.8.0 snapshot.
It also restores the compression on binary doc values that was lost in the last snapshot upgrade.
The compression is now configurable on binary doc values but we don't expose this functionality yet so this commit ensures that we pick the same compression mode as previous releases (BEST_COMPRESSION).
Similar to #62444 but for the outbound path.
This does not detect slowness in individual transport handler logic,
this is done via the inbound handler logging already, but instead
warns if it takes a long time to hand off the message to the relevant
transport thread and then transfer the message over the wire.
This gives some visibility into the stability of the network
connection itself and into the reasons for slow network
responses (if they are the result of slow networking on the sender).
Closes#64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
- Port UrlFixture to test fixture plugin
- Avoid exposing PID and PORt for http fixture when not required
- Make AbstractHttpFixture work inside and outside docker
- Check directories when running UrlFixture
`Augmentation.java` had a zero width space [1] in two method
definitions:
```
public static String[] split(Pattern receiver, int limitFactor, CharSequence input, int limit) {
^------- Right before the ( character
public static Stream<String> splitAsStream(Pattern receiver, int limitFactor, CharSequence input) {
^ Right before the ( here too
```
Sadly, Eclipse and javac treat this character differently. Eclipse seems
to include it in the method name and javac seems to treat it as regular
space. This caused all the unit tests for painless to fail to load
because they couldn't find the `split` and `splitAsStream`
augmentations. But if you listed all of the methods they looked like
they were there. If you crack open the line in a hex editor you can see
it.
Eclipse is tracking [2] similar issues.
[1]: https://en.wikipedia.org/wiki/Zero-width_space
[2]: https://bugs.eclipse.org/bugs/show_bug.cgi?id=547601
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.
Closes#64740.
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)
We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.
As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
This changes the expected error message (on FIPS) so that the
order of the templates (and their associated patterns) matches
the (newly updated) order generated by the server.
Relates: #67066Resolves: #66820
This change fixes problem when using space or tab as a separator in CSV processor - we check if current character is separator before we check if it is whitespace.
This also improves tests to always check all combinations of separators and quotes.
Closes#67013
When removing the "lexer hack" to remove type context from the lexer, static inner class resolution
wasn't properly accounted for. This change adds code to handle static inner class resolution.
This commit reworks the InternalClusterInfoService to run
asynchronously, using timeouts on the stats requests instead of
implementing its own blocking timeouts. It also improves the logging of
failures by identifying the nodes that failed or timed out. Finally it
ensures that only a single refresh is running at once, enqueueing later
refresh requests to run immediately after the current refresh is
finished rather than racing them against each other.
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.
#51816
This leniency was originally for lambda and method reference conversions, but they are both special
cased now. This removes change removes the unnecessary leniency of a cast from a def type to a void
type. This also fixes (#66175).
Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line.
Relates to #66295
This finishes porting all tasks created in gradle build scripts and plugins to use
the task avoidance api (see #56610)
* Port all task definitions to task avoidance api
* Fix last task created during configuration
* Fix test setup in :modules:reindex
* Declare proper task inputs
The "Netty loaded" YAML test asserts that the configured transport is
"netty4", however when in FIPS mode, the tests enable security and the
configured transport is "security4".
This change skips the netty4 yaml test when running in FIPS mode.
Resolves: #66818
If year, year of era, or weekbased year is not specified ingest Java
date processor is defaulting year to current year.
However the current implementation has mistaken weekBasedYear field with
weekOfWeekBasedYear. This has lead to incorrect defaulting.
relates #63458
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
This commit introduces a new sort field called `_shard_doc` that
can be used in conjunction with a PIT to consistently tiebreak
identical sort values. The sort value is a numeric long that is
composed of the ordinal of the shard (assigned by the coordinating node)
and the internal Lucene document ID. These two values are consistent within
a PIT so this sort criteria can be used as the tiebreaker of any search
requests.
Since this sort criteria is stable we'd like to add it automatically to any
sorted search requests that use a PIT but we also need to expose it explicitly
in order to be able to:
* Reverse the order of the tiebreaking, useful to search "before" `search_after`.
* Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8.
I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big.
Relates #56828
Except when writing actual segment files to the blob store
we always write `BytesReference` instead of a stream.
Only having the stream API available forces needless copies
on us. I fixed the straight-forward needless copying for
HDFS and FS repos in this PR, we could do similar fixes for
GCS and Azure as well and thus significantly reduce the peak
memory use of these writes on master nodes in particular.
This PR removes outdated overrides in some tests that prevent them from testing
older index versions. Also removes an old comment + logic from
AggregatorFactoriesTests.
Drops `AggregatorTestCase#mapperServiceMock` because it is getting in
the way of other work I'm doing for runtime fields. It was only
overridden to test the `parent` and `child` aggregation to add the
`MappedFieldType`s for join fields in the backdoor. Those aggregations
can just as easily add those fields in the normal method calls.
This commit fixes a bug in the search_as_you_type field that was introduced during
the refactoring of the field mapper. The prefix field that is used internally
by the search_as_you_type mapper doesn't need term vector even if they are activated
on the main field. So this commit ensures that we don't copy the options from the main
field when we create the prefix sub-field.
Closes#66407