This renames the text structure finder action to match the plugin name.
Also, this adds a new reserved role name so that adding specific permissions for this API is simple.
We no longer regard the autoscaling APIs experimental though they are
only intended for use by ESS/ECE/ECK. This commit updates the docs
to reflect this and adds a minimal set of documentation for the
feature.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Updating dynamic mappings for runtime fields.
* Updating example to fix test case and be more accurate.
* Changing header level for dynamic runtime.
* Clarifying language around ip fields in dynamic template.
limit the depth of nested bool queries
Introduce a new node level setting `indices.query.bool.max_nested_depth`
that controls the depth of nested bool queries.
Throw an error if a nested depth of a bool query exceeds the maximum
allowed nested depth.
Closes#55303
* Changes for dynamic templates.
* Clarifying language around dynamic:true and dynamic:runtime.
* Clarifying edits and some restructuring.
* Overhauling the Mapping page.
* Incorporating changes from #66911.
* Reworking mapping page to focus on dynamic vs. explicit mapping.
* Reordering to fix test failure.
* Further clarifying mapping page.
* Reordering sections, adding headings to examples, and other clarifications.
* Incorporating review feedback.
* Adding description of for Painless script.
This introduces a new `text-structure` plugin. This is the new home of the find file structure API.
The old REST URL is still available but is deprecated.
The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.
Changes to the high-level REST client and docs will be in separate commit.
related to: https://github.com/elastic/elasticsearch/issues/67001
In 7.x the close indices API defaulted to `?wait_for_active_shards=0`
but from 8.0 it defaults to respecting the index settings instead. This
commit introduces the `index-setting` value for this parameter on this
API allowing users to opt-in to the future behaviour today, and emits a
deprecation warning indicating that the default no longer needs to be
used and will be unsupported in future.
In 7.x a follow up PR will introduce support for the same
`index-setting` value for this parameter and will emit deprecation
warnings if users try and use the default instead.
Relates #66419
* [DOCS] Updated data streams list screenshots and delete functionality description
* Update docs/reference/data-streams/set-up-a-data-stream.asciidoc
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Update set-up-a-data-stream.asciidoc
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.
#51816
We were depending on the BouncyCastle FIPS own mechanics to set
itself in approved only mode since we run with the Security
Manager enabled. The check during startup seems to happen before we
set our restrictive SecurityManager though in
org.elasticsearch.bootstrap.Elasticsearch , and this means that
BCFIPS would not be in approved only mode, unless explicitly
configured so.
This commit sets the appropriate JVM property to explicitly set
BCFIPS in approved only mode in CI and adds tests to ensure that we
will be running with BCFIPS in approved only mode when we expect to.
It also sets xpack.security.fips_mode.enabled to true for all test clusters
used in fips mode and sets the distribution to the default one. It adds a
password to the elasticsearch keystore for all test clusters that run in fips
mode.
Moreover, it changes a few unit tests where we would use bcrypt even in
FIPS 140 mode. These would still pass since we are bundling our own
bcrypt implementation, but are now changed to use FIPS 140 approved
algorithms instead for better coverage.
It also addresses a number of tests that would fail in approved only mode
Mainly:
Tests that use PBKDF2 with a password less than 112 bits (14char). We
elected to change the passwords used everywhere to be at least 14
characters long instead of mandating
the use of pbkdf2_stretch because both pbkdf2 and
pbkdf2_stretch are supported and allowed in fips mode and it makes sense
to test with both. We could possibly figure out the password algorithm used
for each test and adjust password length accordingly only for pbkdf2 but
there is little value in that. It's good practice to use strong passwords so if
our docs and tests use longer passwords, then it's for the best. The approach
is brittle as there is no guarantee that the next test that will be added won't
use a short password, so we add some testing documentation too.
This leaves us with a possible coverage gap since we do support passwords
as short as 6 characters but we only test with > 14 chars but the
validation itself was not tested even before. Tests can be added in a followup,
outside of fips related context.
Tests that use a PKCS12 keystore and were not already muted.
Tests that depend on running test clusters with a basic license or
using the OSS distribution as FIPS 140 support is not available in
neither of these.
Finally, it adds some information around FIPS 140 testing in our testing
documentation reference so that developers can hopefully keep in
mind fips 140 related intricacies when writing/changing docs.
This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the same mapping is used for
the entire "query phase" of the search.
Closes#62033
In 8.x the default for `?wait_for_active_shards` changes from `NONE` to
`DEFAULT` on calls to `POST /index/_close`. This commit adds this change
to the breaking changes docs.
Relates #66419, #66542
Generating certificates with the cert sub-command now requires either: 1) a CA
to be provided with --ca or --ca-cert/--ca-key; or 2) make them self-signed
with the --self-signed option. Generating a CA on the fly is no longer
supported. The --keep-ca-key option is removed and the tool throws an error
saying the CA needs to be generated separately if the option is specified.
This is a follow-up PR for #61884, which deprecated the "ca-on-the-fly" usage.
There is little evidence of this endpoint being used
and there is quite a lot of code complexity associated
with the various formats that can be used to upload
data and the different errors that can occur when direct
data upload is open to end users.
In a future release we can make this endpoint internal
so that only datafeeds can use it, and remove all the
options and formats that are not used by datafeeds.
End users will have to store their input data for
anomaly detection in Elasticsearch indices (which we
believe all do today) and use a datafeed to feed it
to anomaly detection jobs.
We lost the `logger.transport.name` line in #65169 and I incorrectly
extrapolated from what was left and mangled it further in #66318. This
commit fixes things.
Added documentation for the aggregate_metric_double field that was merged in #56745
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit is fixing a potential bug if we support anomaly detection
results index rollover in the future.
In particular, we determine the current `data_counts` by sorting on the
latest record time. However, this is not correct if the job reverts
to an older model snapshot. To fix this we add `log_time` to `data_counts`
(similarly to `model_size_stats`) and sort on `log_time` to figure
out the current counts for the job.
Today the docs use `logger.org.elasticsearch.transport: TRACE` as the
example for adjusting the logger config. This is a dangerous thing to
suggest since that's one of the most verbose loggers we have. An
accidental copy-and-paste of this example into a busy cluster can
cause damage.
This commit suggests `logger.org.elasticsearch.discovery: DEBUG`
instead, which is much more benign.
It also corrects the order of the levels and notes that `DEBUG` and
`TRACE` are only for expert use.
This commit moves the ownership of tracking the rollup_index from
the RollupActionConfig to the RollupAction.Request.
This is cleaner since the config should not be concerned with the
source and rollup indices.
relates #42720.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Removed the autoscaling feature flags, autoscaling is now on by default
(though it requires an external system to handle the autoscaling
events). Added experimental notice to all autoscaling related
documentation pages.
Relates #51191
The option to use templates already defined in the cluster is not explicitly stated in the docs.
This PR adds and example to the `rank_eval` documentation.
Closes#62758.
Include the Stack log4j config in the Docker image, in order to
make it possible to write logs in a container environment in the
same way as for an archive or package deployment. This is useful
in situations where the user is bind-mounting the logs directory
and has their own arrangements for log shipping.
To use stack logging, set the environment variable `ES_LOG_STYLE`
to `file`. It can also be set to `console`, which is the same as
not specifying it at all.
The Docker logging config is now auto-generated at image build time,
by running the default config through a transformer program when
preparing the distribution in an image builder step.
Also, in the docker distribution `build.gradle`, I changed a helper
closure into a class with a static method in order to fix an
issue where the Docker image was always being rebuilt, even when
there were no changes.
* First steps in docs for runtime fields.
* Adding new page for runtime fields.
* Adding page for runtime fields.
* Adding more to the runtime fields topic.
* Adding parameters and retrieval options for runtime fields.
* Adding TESTSETUP for index creation.
* Incorporating review feedback.
* Incorporating reviewer feedback.
* Adding examples for runtime fields.
* Adding more context and simplifying the example.
* Changing timestamp to @timestamp throughout.
* Removing duplicate @timestamp field.
* Expanding example to hopefully fix CI builds.
* Adding skip test for result.
* Adding missing callout.
* Adding TESTRESPONSEs, which are currently broken.
* Fixing TESTRESPONSEs.
* Incorporating review feedback.
* Several clarifications, better test cases, and other changes.
* Adding missing callout in example.
* Adding substitutions to TESTRESPONSE for shorter results shown.
* Shuffling some information and adding link to script-fields.
* Fixing typo.
* Updates for API redesign -- will break builds.
* Updating examples and including info about overriding fields.
* Updating examples.
* Adding info for using runtime fields in the search request.
* Adding that queries against runtime fields are expensive.
* Incorporating feedback from reviewers.
* Minor changes from reviews.
* Adding alias for test case.
* Adding aliases to PUT example.
* Fixing test cases, for real this time.
* Updating use cases and introducing overlay throughout.
* Edits, adding 'shadowing', and explaining shadowing better.
* Streamlining tests and other changes.
* Fix formatting in example for test.
* Apply suggestions from code review
* Incorporating reviewer feedback 7 Dec
* Shifting structure of mapping page to fix cross links.
* Revisions for shadowing, overview, and other sections.
* Removing dot notation section and incorporating review changes.
* Adding updated example for shadowing.
* Streamlining shadowing example and TESTRESPONSEs.
Today we document the use of `_[networkInterface]_` to specify the
addresses of a network interface but do not spell out which parts of
this syntax should be taken literally and which are part of the
placeholder for the interface name. If you get it wrong then the
exception message is confusing too since it uses the results of
`NetworkInterface#toString()` which contains much more than just the
name of the interface.
This commit clarifies the docs and the exception message.
Closes#65978.
Its been several months and we haven't bumped into any good reason to
rework the variable width histogram. So let's drop experimental from it!
Closes#58573
When a data stream is being auto followed then a rollover in a local cluster can break auto following,
if the local cluster performs a rollover then it creates a new write index and if then later the remote
cluster rolls over as well then that new write index can't be replicated, because it has the same name
as in the write index in the local cluster, which was created earlier.
If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams.
The data stream should be rolled over in the remote cluster and that change should replicate to the local
cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should
perform.
To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class.
The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is
a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag
is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example
during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated
data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new
write index is no longer a follower index. Also if the put follow api is attempting to update this data stream
(for example to attempt to resume auto following) then that with fail, because the data stream is no longer a
replicated data stream.
Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster
to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the
alias doesn't have a write index. The added replicated field in the DataStream class and added validation
achieve the same kind of protection, but in a more robust way.
A followup from #61993.
Today we document the settings used to control rebalancing and
disk-based shard allocation but there isn't really any discussion around
what these processes do so it's hard to know what, if any, adjustments
to make.
This commit adds some words to help folk understand this area better.
Transform writes dates as epoch millis, this does not work for historic data in some cases or is
unsupported. Dates should be written as such. With this PR transform starts writing dates in ISO
format, but as existing transform might rely on the format it provides backwards compatibility for
old jobs as well as a setting to write dates as epoch millis.
fixes#63787
This is a follow-up PR for #65256 to fix the xpack info and usage reports for
operator privilegs. In summary, this PR ensures:
* _xpack does not report operator privileges because it is categorised under
security
* _xpack/usage reports operator privileges status under the security
section
* _license/feature_usage reports last used time of operator privileges.
It is up to the downstream to filter out this report if necessary.
In case the local agg sorter queue gets full and no limit has been provided,
the local sorter will now erroneously call the failure callback for every
single row in the original rowset that's left over the local queue limit
(instead for just the first one). The failure response is dispatched in any
case, so this is relatively harmless. The sorter continues iterating on the
original response fetching subsequent pages. In case of correct Elasticsearch
behaviour, this is also harmless, it'll just trigger a number of internal
exceptions. However, in case of a pagination defect in Elasticsearch (like
GH#65685, where the same search_after is returned), this will result in an
effective spin loop, potentially rendering eventually the node unresponsive.
This PR simply breaks both the inner loop iterating over the current unsorted
rowset, as well as the outer one, iterating over the left pages.
It also fixes an outdated documentation limitation.
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.
The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats. An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.
Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory. To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x. However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.
Relates #63163
In some Elastic Stack environments, there is a distinction between the operator
of the cluster infrastructure and the administrator of the cluster. This
distinction cannot be supported currently because the "administrator" often has
the superuser role which grants each and every privilege of the cluster.
This PR adds a new feature to protect a fixed set of APIs from the
"administrator" even when it is a highly privileged user such as superuser. It
enhances the Elasticsearch security model to have an additional layer of
restriction in addition to the RBAC.
Co-authored-by: Tim Vernum <tim@adjective.org>
Today in `7.x` there is a deprecated system property that bypasses the
check that prevents nodes of incompatible builds from communicating.
This commit removes the system property in `master` so that the check is
always enforced.
Relates #65601, #65249
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.
Closes#63690
Currently, if you write a date range query with numeric 'to' or 'from' bounds,
they can be interpreted as years if no format is provided. We use
"strict_date_optional_time||epoch_millis" in this case that can interpret inputs
like 1000 as the year 1000 for example.
This PR change this to always interpret and parse numbers with the "epoch_millis"
parser if no other formatter was provided.
Closes#63680
Today we describe snapshots as "incremental" but their incrementality is
rather different beast from e.g. incremental filesystem backups. With
traditional backups you take a large and relatively infrequent "full"
backup and then a sequence of smaller "incremental" ones, and this whole
sequence of backups is required for a restore so it must be kept around
until at least the next full backup. In contrast, Elasticsearch
snapshots are logically independent and each can be deleted without
affecting the integrity of the others.
This distinction frequently causes confusion amongst newer users, so
this commit clarifies what we mean by "incremental" in the docs.
Whether the cold tier can handle years depends a lot on the use case and
for instance our BWC guarantees. This would need to be part of a
specific sizing exercise, so in the spirit of not over-promising, the
description of the cold tier has been changed to not mention years.
Generating a CA on the fly is an attempt at workflow optimisation that was
inherited from certgen. There are potential pitfalls with this approach. Overall
it is recommended to separate the step of CA creation and mandate a CA to be
specified when generating certificate.
This PR add a deprecation message if the cert command is used without specifying
a CA. A follow up PR will throw error for this usage in 8.0.
For use case where we explicitly trust a certificate without needing a CA, e.g.
SAML message signing, the PR adds a --self-signed option to the cert sub-command
to generate self-signed certificate.
Clarify that searchable snapshots only result in cost savings for less
frequently accessed data and that the savings do not apply to the entire
cluster.
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Addressing the PR feedback
adding doc changes
* Renaming the hash function + rephrasing the doc descriptions
* Removing leftover from the doc
* Return HexCharArray instead of Base64 encoding and avoid intermediate
String
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>