Today we document the use of `_[networkInterface]_` to specify the
addresses of a network interface but do not spell out which parts of
this syntax should be taken literally and which are part of the
placeholder for the interface name. If you get it wrong then the
exception message is confusing too since it uses the results of
`NetworkInterface#toString()` which contains much more than just the
name of the interface.
This commit clarifies the docs and the exception message.
Closes#65978.
Its been several months and we haven't bumped into any good reason to
rework the variable width histogram. So let's drop experimental from it!
Closes#58573
When a data stream is being auto followed then a rollover in a local cluster can break auto following,
if the local cluster performs a rollover then it creates a new write index and if then later the remote
cluster rolls over as well then that new write index can't be replicated, because it has the same name
as in the write index in the local cluster, which was created earlier.
If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams.
The data stream should be rolled over in the remote cluster and that change should replicate to the local
cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should
perform.
To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class.
The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is
a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag
is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example
during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated
data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new
write index is no longer a follower index. Also if the put follow api is attempting to update this data stream
(for example to attempt to resume auto following) then that with fail, because the data stream is no longer a
replicated data stream.
Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster
to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the
alias doesn't have a write index. The added replicated field in the DataStream class and added validation
achieve the same kind of protection, but in a more robust way.
A followup from #61993.
Today we document the settings used to control rebalancing and
disk-based shard allocation but there isn't really any discussion around
what these processes do so it's hard to know what, if any, adjustments
to make.
This commit adds some words to help folk understand this area better.
Transform writes dates as epoch millis, this does not work for historic data in some cases or is
unsupported. Dates should be written as such. With this PR transform starts writing dates in ISO
format, but as existing transform might rely on the format it provides backwards compatibility for
old jobs as well as a setting to write dates as epoch millis.
fixes#63787
This is a follow-up PR for #65256 to fix the xpack info and usage reports for
operator privilegs. In summary, this PR ensures:
* _xpack does not report operator privileges because it is categorised under
security
* _xpack/usage reports operator privileges status under the security
section
* _license/feature_usage reports last used time of operator privileges.
It is up to the downstream to filter out this report if necessary.
In case the local agg sorter queue gets full and no limit has been provided,
the local sorter will now erroneously call the failure callback for every
single row in the original rowset that's left over the local queue limit
(instead for just the first one). The failure response is dispatched in any
case, so this is relatively harmless. The sorter continues iterating on the
original response fetching subsequent pages. In case of correct Elasticsearch
behaviour, this is also harmless, it'll just trigger a number of internal
exceptions. However, in case of a pagination defect in Elasticsearch (like
GH#65685, where the same search_after is returned), this will result in an
effective spin loop, potentially rendering eventually the node unresponsive.
This PR simply breaks both the inner loop iterating over the current unsorted
rowset, as well as the outer one, iterating over the left pages.
It also fixes an outdated documentation limitation.
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.
The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats. An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.
Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory. To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x. However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.
Relates #63163
In some Elastic Stack environments, there is a distinction between the operator
of the cluster infrastructure and the administrator of the cluster. This
distinction cannot be supported currently because the "administrator" often has
the superuser role which grants each and every privilege of the cluster.
This PR adds a new feature to protect a fixed set of APIs from the
"administrator" even when it is a highly privileged user such as superuser. It
enhances the Elasticsearch security model to have an additional layer of
restriction in addition to the RBAC.
Co-authored-by: Tim Vernum <tim@adjective.org>
Today in `7.x` there is a deprecated system property that bypasses the
check that prevents nodes of incompatible builds from communicating.
This commit removes the system property in `master` so that the check is
always enforced.
Relates #65601, #65249
Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.
Closes#63690
Currently, if you write a date range query with numeric 'to' or 'from' bounds,
they can be interpreted as years if no format is provided. We use
"strict_date_optional_time||epoch_millis" in this case that can interpret inputs
like 1000 as the year 1000 for example.
This PR change this to always interpret and parse numbers with the "epoch_millis"
parser if no other formatter was provided.
Closes#63680
Today we describe snapshots as "incremental" but their incrementality is
rather different beast from e.g. incremental filesystem backups. With
traditional backups you take a large and relatively infrequent "full"
backup and then a sequence of smaller "incremental" ones, and this whole
sequence of backups is required for a restore so it must be kept around
until at least the next full backup. In contrast, Elasticsearch
snapshots are logically independent and each can be deleted without
affecting the integrity of the others.
This distinction frequently causes confusion amongst newer users, so
this commit clarifies what we mean by "incremental" in the docs.
Whether the cold tier can handle years depends a lot on the use case and
for instance our BWC guarantees. This would need to be part of a
specific sizing exercise, so in the spirit of not over-promising, the
description of the cold tier has been changed to not mention years.
Generating a CA on the fly is an attempt at workflow optimisation that was
inherited from certgen. There are potential pitfalls with this approach. Overall
it is recommended to separate the step of CA creation and mandate a CA to be
specified when generating certificate.
This PR add a deprecation message if the cert command is used without specifying
a CA. A follow up PR will throw error for this usage in 8.0.
For use case where we explicitly trust a certificate without needing a CA, e.g.
SAML message signing, the PR adds a --self-signed option to the cert sub-command
to generate self-signed certificate.
Clarify that searchable snapshots only result in cost savings for less
frequently accessed data and that the savings do not apply to the entire
cluster.
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Addressing the PR feedback
adding doc changes
* Renaming the hash function + rephrasing the doc descriptions
* Removing leftover from the doc
* Return HexCharArray instead of Base64 encoding and avoid intermediate
String
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
During highlighting, we now load all values that were copied into the field
through copy_to. So there's no longer a reason to set 'store: true' to account
for fields not available in _source.
In some cases when the rate aggregation is not a child of a date histogram
aggregation, it is not possible to determine the actual size of the date
histogram bucket. In this case the rate aggregation now throws an exception.
Closes#63703
Previously, geo_shape support was only mentioned in a dedicated x-pack
section. This may be misleading, as the introductory paragraph only
mentions geo_point.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adds the capability to have functions with two optional arguments
* Adds two new optional arguments to `PERCENTILE()` and
`PERCENTILE_RANK()` functions, namely the method and
method_parameter which can be: 1) `tdigest` and a double `compression`
parameter or 2) `hdr` and an integer representing the
`number_of_digits` parameter.
* Integration tests
* Documentation updates
Closes#63567
This PR adds detail to the explanation of the soft_limit
memory_status in ML job stats. A consequence that was not
mentioned before is that examples are not added to category
definitions.
Relates elastic/ml-cpp#1590
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.
#### specifics
A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.
the `sort` field would support any numeric field, including date.
#### sample usage
```
{
"query": {
"bool": {
"must": [
{ "term": { "person": "004" } },
{ "term": { "trajectory": "20090131002206.plt" } }
]
}
},
"aggs": {
"make_line": {
"geo_line": {
"point": {"field": "location"},
"sort": { "field": "timestamp" },
"include_sort": true,
"sort_order": "desc",
"size": 15
}
}
}
}
```
#### sample response
```
{
"took": 21,
"timed_out": false,
"_shards": {...},
"hits": {...},
"aggregations": {
"make_line": {
"type": "LineString",
"coordinates": [
[
121.52926194481552,
38.92878997139633
],
[
121.52922699227929,
38.92876998055726
],
]
}
}
}
```
#### visual response
<img width="540" alt="Screen Shot 2019-04-26 at 9 40 07 AM" src="https://user-images.githubusercontent.com/388837/56834977-cf278e00-6827-11e9-9c93-005ed48433cc.png">
#### limitations
Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.
One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore
closes#41649
The _Important Elasticsearch configuration_ docs lists a number of items
that you should consider before moving to production. Today this list
does not include configuring snapshots, even though they're very
important to have in production. This commit addresses that omission,
removes some repetition from the introductory paragraphs, and notes that
this config is handled for you on Cloud.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking
This change adds an extra piece of information,
limits.total_ml_memory, to the ML info response.
This returns the total amount of memory that ML
is permitted to use for native processes across
all ML nodes in the cluster. Some of this may
already be in use; the value returned is total,
not available ML memory.
Clarifies differences between the
`cluster.routing.allocation.total_shards_per_node` and
`cluster.max_shards_per_node` cluster settings.
Closes#51839
Co-authored-by: Gordon Brown <arcsech@gmail.com>
This new API provides a way for users to upgrade their own anomaly job
model snapshots.
To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
native process)
relates #64154
Include the attempted 'match_mapping_type' into the message,
so that it is clearer that multiple validation attempts have occurred.
Dynamic template validation was recently added via #51233 and
there was some confusion over the deprecation message itself.
(in 7.x only deprecation warning will be omitted and from 8.0
an error will be returned)
This adds support for the searchable_snapshot ILM action in the hot phase.
We define a series of actions that cannot be executed after the index has been
mounted as a searchable snapshot. Namely: freeze, forcemerge, shrink,
and searchable_snapshot (also available in the cold phase).
If by virtue of snapshot/restoring a managed index or updating an ILM policy while it
is executing for an index, these actions could get to be executed on an index that was
mounted as searchable snapshot in the hot phase. If this happens the actions will
skip entirely. ILM will not move into the ERROR step.
In the process of developing a new implementation for the Elasticsearch Rollups functionality we came up with the concept of the aggregate metric field type.
The aggregate_metric_double field type can store the results of aggregations (currently min, max, sum, value_count and avg are supported - more to come).
This field allows us to run (min, max, sum, value_count, avg) aggregations on the container field and the field will return the correct metric depending on the aggregation that is computed.
I'm not sure if this setting was left here deliberately? or by accident?
With all other node role definition has changed syntax from `node.xxx` to `node.roles: [ ]`, the ingest one is the only one left behind.
Added the capability to delete autoscaling policies by pattern, allowing
to for instance do:
```
DELETE _autoscaling/policy/*
```
to delete all autoscaling policies. If a wildcard is involved, no
matches are required.
* Remove constant_keyword from SQL docs
`constant_keyword` removed as distinct type from SQL in #60524.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Adds a limit to the maximum number of snapshots that are allowed
to be added to a snapshot repository as a safety measure of last resort
against repositories that grow to an unmanagable size due to e.g. incorrect SLM
settings.
Co-authored-by: David Turner <david.turner@elastic.co>
Bucket aggregations compute bucket doc_count values by incrementing the doc_count by 1 for every document collected in the bucket.
When using summary fields (such as aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (named doc_count field mapper). This field is a positive integer representing the number of documents aggregated in a single summary field.
Bucket aggregations will check if a field of type doc_count exists in a document and will take this value into consideration when computing doc counts.
This commit adds support data stream support to CCR's auto following by making the following changes:
* When the auto follow coordinator iterates over the candidate indices to follow,
the auto follow coordinator also checks whether the index is part of a data stream and
if the name of data stream also matches with the auto follow pattern then the index
will be auto followed.
* When following an index, the put follow api also checks whether that index is part
of a data stream and if so then also replicates the data stream definition to the
local cluster.
* In order for the follow index api to determine whether an index is part of a data
stream, the cluster state api was modified to also fetch the data stream definition
of the cluster state if only the state is queried for specific indices.
When a data stream is auto followed, only new backing indices are auto followed.
This is in line with how time based indices patterns are replicated today. This
means that the data stream isn't copied 1 to 1 into the local cluster. The local
cluster's data stream definition contains the same name, timestamp field and
generation, but the list of backing indices may be different (depending on when
a data stream was auto followed).
Closes#56259