Commit Graph

2155 Commits

Author SHA1 Message Date
James Baiera 589bb1f26c
Data Stream Stats API (#58707)
This API reports on statistics important for data streams, including the number of data 
streams, the number of backing indices for those streams, the disk usage for each data 
stream, and the maximum timestamp for each data stream
2020-07-14 14:51:32 -04:00
Tim Brooks aa14860597
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:22:42 -06:00
Mayya Sharipova b130a1bfbb
Fix the test version in highlighters test (#59515)
This was supposed to be done after the backport but was missed.

Related to #53408
2020-07-14 11:24:34 -04:00
Dan Hermann 1d3a723f68
Adds write_index_only option to put mapping API (#59396) 2020-07-14 08:25:10 -05:00
Andrei Dan 5609353c5d
Default to @timestamp in composable template datastream definition (#59317)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
2020-07-14 11:45:48 +01:00
Andrei Dan 4e72f43d62
Composable templates: add a default mapping for @timestamp (#59244)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
2020-07-14 09:19:00 +01:00
Tim Brooks 9f22634bcd
Update versions for indexing pressure backport (#59472)
This commit updates the node stats version constants to reflect the fact
that index pressure stats were backported to 7.9. It also reenables BWC
tests.
2020-07-13 18:29:32 -06:00
Igor Motov 12c61e0d80
Change 7.9.99 -> 7.99.99 in tests (#59469)
Since we most likely going to have 7.10 we should update version
in tests skips to 7.99.99.
2020-07-13 17:26:15 -04:00
Igor Motov 0af410ad0b
Adds hard_bounds to histogram aggregations (#59175)
* Adds hard_bounds to histogram aggregations

Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.

Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
2020-07-13 16:43:11 -04:00
Nik Everett e6e906d0e7 Update skip after backport of #42035 2020-07-13 16:39:30 -04:00
Tim Brooks b87bb86d88
Adding indexing pressure stats to node stats API (#59247)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 10:37:46 -06:00
Martijn van Groningen 40b9fd49e0
Make data streams a basic licensed feature. (#59293)
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper 
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 11:43:42 +02:00
Dan Hermann 198b4253d9
Data stream admin actions are now index-level actions (#59095) 2020-07-10 12:25:07 -05:00
Martijn van Groningen cb6b05d12b
Fix the timestamp field of a data stream to @timestamp (#59076)
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 09:41:47 +02:00
James Rodewig 2be9db01c8
[DOCS] Replace `datatype` with `data type` (#58972) 2020-07-07 13:52:10 -04:00
Andrei Dan 0d9c98a823
GET data stream API returns additional information (#59128)
This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template` 
field in the `GET _data_stream` response optional.
2020-07-07 17:52:40 +01:00
Nik Everett 38667b8fe5
Update skip after backport (#59153)
Update a skip after backporting #59099.
2020-07-07 11:33:01 -04:00
Jake Landis bd49766fde
Add build plugin to rest-api-spec to properly generate pom (#59142)
The build plugin is still necessary to generate the POM file.
This commit adds the build plugin plugin back to the rest-api-spec
project. It was recently removed as it was thought to be
unnecessary.
2020-07-07 09:58:07 -05:00
Nik Everett 3b3ed4b4a7
Fix lookup support in adjacency matrix (#59099)
This request:
```
POST /_search
{
  "aggs": {
    "a": {
      "adjacency_matrix": {
        "filters": {
          "1": {
            "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
          }
        }
      }
    }
  }
}
```

Would fail with a 500 error and a message like:
```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason":"async actions are left after rewrite"
      }
    ]
  }
}
```

This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
2020-07-06 18:53:19 -04:00
Jake Landis 333a5d8cdf
Create plugin for yamlTest task (#56841)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 12:13:01 -05:00
Dan Hermann be45d016e4
Delete data stream API accepts multiple names (#58833) 2020-07-06 06:45:58 -05:00
Dan Hermann 100e7a6651
Data stream support for indices shard stores API (#58591) 2020-07-06 06:44:31 -05:00
Dan Hermann c3aaf33d73
Ignore matching data streams if include_data_streams is false (#57900) 2020-07-03 08:46:01 -05:00
Dan Hermann e592a9a5e7
Add include_data_streams flag for authorization (#58154) 2020-07-02 23:53:31 -05:00
Dan Hermann 50ed781c9f
Mirror privileges over data streams to their backing indices (#58381) 2020-07-02 21:56:16 -05:00
Martijn van Groningen 001b3fb440
Add data stream timestamp validation via metadata field mapper (#58582)
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a 
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-02 10:58:18 +02:00
David Turner acf031cdb5
Forbid read-only-allow-delete block in blocks API (#58727)
* Forbid read-only-allow-delete block in blocks API

The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.

* Missing xref

* Reword paragraph on read-only-allow-delete block
2020-07-01 12:57:34 +01:00
David Turner 83d6589b2a
Account for remaining recovery in disk allocator (#58029)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
2020-07-01 08:04:45 +01:00
Lee Hinman 29c05544ec
Fix template name in mapping composition yml test (#58788)
The warning was copied from elsewhere and just needed to use the correct template and index name.
2020-06-30 17:03:47 -06:00
Julie Tibshirani 416cb6b31e Adjust the skip version for template mapping merging REST test. 2020-06-30 13:22:16 -07:00
Martijn van Groningen 906aed4a88
Add data stream support to put mapping and update index settings APIs. (#58231)
Change update index setting and put mapping api
to execute on all backing indices if data stream is targeted.

Relates #53100
2020-06-30 17:23:27 +02:00
Lee Hinman 3b68df2355
Add default composable templates for new indexing strategy (#57629)
This commit adds the component and composable templates, as well as ILM policies, for the new
default indexing strategy. It installs:

- logs-default-mappings (component)
- logs-default-settings (component)
- logs-default-policy (ilm policy)
- logs-default-template (composable template)
- metrics-default-mappings (component)
- metrics-default-settings (component)
- metrics-default-policy (ilm policy)
- metrics-default-template (composable template)

These templates and policies are managed by a new x-pack module, `stack`, and can be disabled by
setting `stack.templates.enabled` to `false`.

These ensure that patterns for the `logs-*-*` and `metrics-*-*` indices are set up to create data
streams with the proper mappings and settings.

This also makes changes to the `IndexTemplateRegistry` to support installing component and
composable templates (previously it supported only legacy templates).

Resolves #56709
2020-06-30 09:19:37 -06:00
Yannick Welsch e4df92815e Adapt BWC after backport of (#58094) 2020-06-30 14:09:03 +02:00
Yannick Welsch 5e345e115b
Add index block api (#58094)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 09:33:15 +02:00
Julie Tibshirani 676893a263
Merge mappings for composable index templates (#58521)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-29 15:00:40 -07:00
Enrico Zimuel 9a7a28958a
Added PERL reserved words in REST keywords (#58535) 2020-06-26 12:11:22 +02:00
Dan Hermann edc15d7c90
Add data stream support to open index API (#58487) 2020-06-25 09:13:53 -05:00
Dan Hermann 603c5f7a48
Data stream support for get field mappings API (#58488)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-25 08:08:51 -05:00
Dan Hermann 991f635c7f
Data stream support for search shards API (#58486) 2020-06-25 07:57:56 -05:00
Rory Hunter 48c9a0776b Update rest-api-spec keyword list
Follow-up to 35aecf4c9a. Somehow I missed the fact that there's an ILM
API named `retry`, which is a keyword in Ruby. I've removed it from the
keywords list.
2020-06-25 09:53:16 +01:00
Rory Hunter 35aecf4c9a
Validate that REST API names do not contain keywords (#58452)
If an API name (or components of a name) overlaps with a reserved word in
the programming language for an ES client, then it's possible that the code
that is generated from the API will not compile. This PR adds validation to
check for such overlaps.
2020-06-25 09:47:05 +01:00
Martijn van Groningen 0166e0e5a3
Re-enable data streams yaml tests in bwc mode (#58403) 2020-06-24 14:55:57 +02:00
James Dorfman e99d287fbb
Add Variable Width Histogram Aggregation (#42035)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue. 

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
2020-06-23 09:26:54 -04:00
Martijn van Groningen 085ba99fba
Keep track of timestamp_field mapping as part of a data stream (#58096)
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 12:01:01 +02:00
Jim Ferenczi b0f4024879
Adapt bwc version after backport of #58299 (#58300)
This commit adapts the bwc version in preparation of the backport
to 7.x. The bwc tests are disabled in order to allow the merge of
#58299.

Relates #58299
2020-06-18 10:22:54 +02:00
Rory Hunter 1f6c953194
Rename dangling index APIs (#58266)
The dangling_indices.import API name could cause issues in the client
libs because import is a reserved word in many languages. Rename the
API to avoid this, and rename the other APIs for consistency.

Related to #48366.
2020-06-18 08:57:39 +01:00
Jim Ferenczi 90c9b95ca0
Allow index filtering in field capabilities API (#57276)
* Add index filtering in field capabilities API

This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
2020-06-17 22:53:53 +02:00
Rory Hunter ebe8951879
Implement dangling indices API (#50920)
Part of #48366. Implement an API for listing, importing and deleting dangling
indices.

Co-authored-by: David Turner <david.turner@elastic.co>
2020-06-16 15:19:17 +01:00
Dan Hermann cce279bbb3
Prohibit clone, shrink, and split on a data stream's write index (#58104) 2020-06-16 08:37:48 -05:00
Nik Everett 7c7fe0152d
Save memory when auto_date_histogram is not on top (#57304)
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487
2020-06-15 14:33:31 -04:00