This change updates the way we handle net new system indices, which are
those that have been newly introduced and do not require any BWC
guarantees around non-system access. These indices will not be included
in wildcard expansions for user searches and operations. Direct access
to these indices will also not be allowed for user searches.
The first index of this type is the GeoIp index, which this change sets
the new flag on.
Closes#72572
Since the SQL `SUM` function behaves as expected, #45251 can be closed.
As soon as #71582 is resolved, we can go back to using the
`sum` aggregation instead of `stats`.
Fixes https://github.com/elastic/elasticsearch/issues/64567
Queries with a literal selection and a filter like `SELECT 1 FROM test_emp WHERE gender = 'F'` are currently erroneously optimised to use a local relation. This causes ES to always return a single record, no matter how many records match the filter condition.
This PR makes sure that `SkipQueryIfFoldingProjection` only skips the query if it's an aggregate with only constants (e.g. `SELECT 'foo' FROM test GROUP BY 1`). This optimization seems to lead to another issue https://github.com/elastic/elasticsearch/issues/74064 that's not yet addressed in this PR.
Besides this the "skip query" optimization, the `SkipQueryIfFoldingProjection` class also folds constants from `LocalRelation`s and pushes the evaluated values into a new `LocalRelation` (e.g. for queries like `SELECT 1 + 2`).
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
This adds an async query mode to SQL.
It (re)uses the same request and response async-specific EQL object
parameters.
Also similar to EQL, the running search task can have its state
monitored and canceled and its results stored and deleted, with
intermediary responses not supported (the entire result is available
once search finished).
The async implementation is extended to work with the SQL-specific
text formats (txt, csv, tsv) as well, besides xcontent.
Closes#71041.
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Previously, when a subquery was used with an alias in combination with
a nested GROUP BY, the collapsing of the nested queries into a flattened
`Aggregate` query, lead to wrong attribute qualifier on the external
projection, which was still referencing the removed subquery. e.g.:
For the following query:
```
SELECT languages FROM (
SELECT languages FROM test_emp GROUP BY languages
) AS subquery
```
The `languages` of the top level SELECT, was qualified with `subquery`
which was removed during the flattening optimisation leading to
Exception of not being able to resolve the refenced group:
`test_emp.languages`.
Fix this behaviour by introducing a new rule which precedes the
`PruneSubqueryAliases` rules and updates the `qualifier` for the
`FieldAttributes`.
Fixes: #69263
Ordering an already pre-ordered and limited subselect is not allowed,
as such queries cannot be collapsed and translated into query DSL, but
the require an extra ordering step on top of the results returned
internally by the search/agg query.
Fixes: #71158
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
We use `PathUtils.get` to look up paths with our custom testing
infrastructure rather than `Paths.get`. In the past few years java has
grown a `Path.of` which is very similar to `Paths.get`. Just like
`Paths.get`, we should always be using `PathUtils.get` so that we get
our fancy testing infrastructure. This uses forbiddenapis to ban
`Path.of` and fixed the build errors.
Closes#72392
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
* Fix MIN, MAX, SUM aggs data type handling
This fixes the way the MIN, MAX and SUM handle the returned data types:
- MIN and MAX must return the same data type as input's.
- SUM must return long/BIGINT for integral types and double otherwise.
The fix concerns both data returned in projections, as well as aggs
filtering.
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
Push down filters inside subqueries, even when dealing with aggregates.
The rule already existed however it was not being used inside SQL.
When dealing with Aggregates, keep the aggregate functions in place but
try and push down conjunctions on non-aggregates.
IsNull/IsNotNull inside a conjunction influences other operations done
on the same expression:
f > 10 AND f IS NULL -> f IS NULL
IFNULL(f + 1, f - 1) AND f IS NOT NULL -> f + 1 AND f IS NOT NULL
Fix#70683
The reason for increasing the test coverage on the Verifier
is to gain extra confidence that later moving rules between the
Analyzer and Optimizer won't break any current functionality,
won't turn current checks effectively noop.
This modifies the fields API to return values with half_float's
precision. This makes the fields API better reflect what we've indexed
which we'd like to to do in general. It does make the values that come
back "uglier" because things like `3.14` end up becoming `3.140625`. But
that is what is actually in the index so itsmore "real".
Closes#70260
Previously we did not resolve the attributes recursively which meant that if a field or expression was re-aliased multiple times (through multiple levels of subqueries), the aliases were only resolved one level down. This led to failed query translation because `ReferenceAttribute`s were pointing to non-existing attributes during query translation.
For example the query
```sql
SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j
```
failed during translation because the `OrderBy` resolved the `j` ReferenceAttribute to another `i` ReferenceAttribute that was later removed by an Optimization:
```
OrderBy[[Order[j{r}#4,ASC,LAST]]] ! OrderBy[[Order[i{r}#2,ASC,LAST]]]
\_Project[[j]] = \_Project[[j]]
\_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
\_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] !
```
By resolving the `Attributes` recursively both `j{r}` and `i{r}` will resolve to `test.int{f}` above:
```
OrderBy[[Order[test.int{f}#22,ASC,LAST]]] = OrderBy[[Order[test.int{f}#22,ASC,LAST]]]
\_Project[[j]] = \_Project[[j]]
\_Project[[i]] ! \_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..]
\_EsRelation[test][date{f}#6, some{f}#7, some.string{f}#8, some.string..] !
```
The scope of recursive resolution depends on how the `AttributeMap` is constructed and populated.
Fixes#67237
The ValueFetcher for geo_shape will shortcut the validation of its
source value if it detects that the source format and the requested
format are the same. This worked fine when malformed values were
dealt with by checking the _ignored metadata, but since #68738
we need to always validate source values at fetch time.
This commit removes this special shortcut logic, and adds tests
to check that geo_shape value fetchers do not return malformed
source inputs.
Fixes#69071
To avoid confusion for the users replace the `YYYY` and `uuuu` year
patterns in the examples of `DATETIME_FORMAT/PARSE` with the most common
`yyyy` to avoid any confusion for users that might just copy paste those
queries for their own use case.
Relates to #68030
This adds a verifier rule to check that any fields used in filtering,
aggregations or ordering has the doc_values. Otherwise the query will
either fail in ES with a less obvious and more verbose reason OR plainly
give wrong results if filtering with `IS [NOT] NULL`.
The `SELECT ISO_WEEK_OF_YEAR(a) AS x FROM test WHERE x=4` query returned
with `x=3` results because the `ISO_WEEK_YEAR(a)` in the WHERE clause
that turns into a script query and the `ISO_WEEK_YEAR(a)` in the projections
that turns into a post-processing on top of the Query DSL results execute
different code to calculate the result.
This change unifies the different code paths and results in a single method
being responsible for the actual calculation.
Note: this change impacts the way how all the `DateTimeFunction`s that
do the field extraction from a date get translated into a script query.
Fixes part of #67872
The `MINUTE_OF_DAY()` extraction function does not have an equivalent
expressable using a datetime format pattern.
The `MinuteOfDay.dateTimeFormat()` is called during the query
translation and throws an exception, but the return value actually
does not impact the translated query (binary comparisons with
`DateTimeFunction` on one side always turn into a script query).
This change fixes the immediate issue raised as part of #67872,
add integration tests covering the problem, but leaves the removal
of the unnecessary `dateTimeFormat()` function a separate PR.
* Integrate "fields" API into QL (#68467)
* QL: retry SQL and EQL requests in a mixed-node (rolling upgrade) cluster (#68602)
* Adapt nested fields extraction from "fields" API output to the new un-flattened structure (#68745)
Previously, we extracted the result of the `CardinalityAgg` as `double`
which resulted in values shown in the REST response with `.0`, even though
the type of the corresponding column for `COUNT(DISTINCT <field_name>)`
was showing `long`, e.g.: `152.0` instead of `152`.
This affected only the REST interface of SQL and not the JDBC/ODBC drivers.
Extract a long value instead of a double.
Fixes: #58097
Part 10 (and hopefully the last one).
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
Fixed the inconsistencies regarding NULL argument handling.
NULL literal vs NULL field value as function arguments in some case
resulted in different function return values.
Functions should return with the same value no matter if the argument(s)
came from a field or from a literal.
The introduced integration test tests if function calls with same
argument values (regardless of literal/field) will return with the
same output (also checks if newly added functions are added to the
testcases).
Fixed the following functions:
* Insert: NULL start, length and replacement arguments (as fields) also
result in NULL return value instead of returning the input.
* Locate: NULL pattern results in NULL return value, NULL optional start
argument handled the same as missing start argument
* Replace: NULL pattern and replacement results in NULL instead of
returning the input
* Substring: NULL start or length results in NULL instead of returning
the input
Fixes#58907
Part 7.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
* Simplify arithmetic operations in binary comps
This commit adds an optimizer rule to simplify the arithmetic operations
in binary comparison expressions, which in turn will allow for further
expression compounding by the optimiser.
Only the negation and plus, minus, multiplication and division are
currently considered and only when two of the operands are a literal.
For instance `((a + 1) / 2 - 3) * 4 >= 14` becomes `a >= 12`.
Use an internal new DataType DATETIME_NANOS which is not exposed
and therefore cannot be used for CASTing. DATETIME is used instead
and the precision of both DATETIME and TIME has been promoted from
3 to 9, providing transparency to all datetime functionality regardless
of millis or nanos precision.
Moreover, CURRENT_TIMESTAMP/CURRENT_TIME can now return precision up
to 6 fractional digits of a second with the use of Clock.
Closes: #38562
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
SQL: Implement the TO_CHAR() function
* The implementation is according to PostgreSQL 13 specs:
https://www.postgresql.org/docs/13/functions-formatting.html
* Tested against actual output from PostgreSQL 13 using randomized inputs
* All the Postgres formats are supported, there is also partial supports
for the modifiers (`FM` and `TH` are supported)
* Random unit test data generator script in case we need to upgrade the
formatter in the future
* Documentation
* Integration tests
Co-authored-by: Michał Wąsowicz <mwasowicz7@gmail.com>
Co-authored-by: Andras Palinkas <andras.palinkas@elastic.co>
In #60357 we improved the error message when access to perform an
action on an index was denied by including the index name and the
privileges that would grant the action.
This commit extends the second part of that change (the list of
privileges that would resolve the problem) to situations when a
cluster action is denied.
This implementation for cluster privileges is slightly more complex
than that of index privileges because cluster privileges can be
dependent on parameters in the request, not just the action name.
For example, "manage_own_api_key" should be suggested as a matching
privilege when a user attempts to create an API key, or delete their
own API key, but should not be suggested when that same user attempts
to delete another user's API key.
Relates: #42166
Enhance the resolution of aliases declared inside sub-queries so their
outside use is not restricted only to projections.
This commit replaces attribute references in aggregate groupings and
extends collapsing of Aggregates on top of projections.
Fix#56713
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.
#51816
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.