Commit Graph

6664 Commits

Author SHA1 Message Date
rhymes 74b9878f69 [DOCS] Fix typo in index and search analysis docs (#52988) 2020-03-02 07:22:50 -05:00
István Zoltán Szabó 30bba95902
[DOCS] Adds transform-settings page and its subpage to redirects (#52944)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-03-02 10:48:09 +01:00
Hendrik Muhs 563d906a78
[Transform] implement node.transform to control where to run a transform (#52712)
implement transform node attributes to disable transform on certain nodes and test which nodes are allowed to do remote connections

closes #52200
closes #50033
closes #48734
2020-03-02 09:01:18 +01:00
Rory Hunter 3950c3a9da
Document how to change GC logging behaviour (#52879)
Closes #43990. Describe how to change the default GC settings without changing
the default `jvm.options`. Give examples using `jvm.options.d`, and
`ES_JAVA_OPTS` with Docker.
2020-02-28 21:26:53 +00:00
Lisa Cawley b6534834f9
[DOCS] Adds cat anomaly detectors API (#52866) 2020-02-28 12:15:21 -08:00
Dimitris Athanasiou dd331935b3
[ML] Parse and report memory usage for DF Analytics (#52778)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.
2020-02-28 17:35:07 +02:00
Benjamin Trent d7a63333b5
[ML] Add indices_options to datafeed config and update (#52793)
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. 

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.
 
closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 12:22:35 -05:00
Nattachai Suteerapongpan 76c1c300c9
[DOCS] Fix typo in task management API docs (#52881) 2020-02-27 11:30:03 -05:00
Nik Everett f4223b6a8f
Add size support to `top_metrics` (#52662)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 11:14:57 -05:00
István Zoltán Szabó efc70837aa
[DOCS] Reformats cat DFA API docs. (#52885) 2020-02-27 14:20:29 +01:00
István Zoltán Szabó 03d95372d7
[DOCS] Adds cat trained model API documentation (#52824) 2020-02-27 12:50:11 +01:00
Martijn van Groningen 31b29875c9
Add validation for dynamic templates (#51233)
Tries to load a `Mapper` instance for the mapping snippet of a dynamic template.
This should catch things like using an analyzer that is undefined or mapping attributes that are unused.

This is best effort:
* If `{{name}}` placeholder is used in the mapping snippet then validation is skipped.
* If `match_mapping_type` is not specified then validation is performed for all mapping types.
  If parsing succeeds with a single mapping type then this the dynamic mapping is considered valid.

If is detected that a dynamic template mapping snippet is invalid at mapping update time then the mapping update is failed for indices created on 8.0.0-alpha1 and later. For indices created on prior version a deprecation warning is omitted instead. In 7.x clusters the mapping update will never fail in case of an invalid dynamic template mapping snippet and a deprecation warning will always be omitted.

Closes #17411
Closes #24419

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2020-02-27 11:52:27 +01:00
Josh Devins 4ff5e03c70
Adds recall@k metric to rank eval API (#52577)
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: https://github.com/elastic/elasticsearch/issues/51676
2020-02-27 10:43:42 +01:00
Costin Leau 3e039282bf
EQL: Hook engine to Elasticsearch (#52828)
Add query execution and return actual results returned from
Elasticsearch inside the tests
2020-02-27 11:16:26 +02:00
David Turner e2cda1a279
"Adding nodes" instructions only work on localhost (#52677)
The introductory sections of the reference manual contains some simplified
instructions for adding a node to the cluster. Unfortunately they are a little
too simplified and only really work for clusters running on `localhost`. If you
try and follow these instructions for a distributed cluster then the new node
will, confusingly, auto-bootstrap itself into a distinct one-node cluster.

Multiple nodes running on localhost is a valid config, of course, but we should
spell out that these instructions are really only for experimentation and that
it takes a bit more work to add nodes to a distributed cluster. This commit
does so.

Also, the "important config" instructions for discovery say that you MUST set
`discovery.seed_hosts` whereas in fact it is fine to ignore this setting and
use a dynamic discovery mechanism instead. This commit weakens this statement
and links to the docs for dynamic discovery mechanisms.

Finally, this section is also overloaded with some technical details that are
not important for this context and are adequately covered elsewhere, and
completely fails to note that the default discovery port is 9300. This commit
addresses this.
2020-02-27 08:51:17 +00:00
James Rodewig fb64c18ac6
[DOCS] Update term vectors snippet to prevent CI failure (#52819)
Adds the `?refresh=wait_for` query argument to an index API snippet in
the term vectors API docs.

This should ensure the document is indexed and available before a
subsequent term vectors API request executes.

Fixes #52814.
2020-02-26 12:35:05 -05:00
Lisa Cawley 42fbca7dc6
[DOCS] Adds cat datafeeds API (#52738) 2020-02-26 09:20:36 -08:00
Hendrik Muhs 854f698cec
Percentiles aggregation: disallow specifying same percentile v… (#52257)
Disallow specifying the same percentile multiple times in percentiles
aggregation

Related: #51871
2020-02-26 13:40:33 +01:00
Bogdan Pintea 451c341e01
remove references to the SQL API from ODBC config (#52765)
Remove reference to an "SQL API" which could suggest that one needs to
treat this in a special way when configuring the ODBC driver.
2020-02-26 13:32:22 +01:00
István Zoltán Szabó 490e8b47e6
[DOCS] Adds cat data frame analytics API (#52764)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-26 11:09:37 +01:00
Lisa Cawley 729fe26c5a
[DOCS] Fixes monitoring links (#52790) 2020-02-25 16:57:38 -08:00
Lisa Cawley cd069a861c
[DOCS] Updates custom rules example (#52731) 2020-02-25 09:30:14 -08:00
Andrei Stefan 556f5fa33b
SQL: Use calendar_interval of 1d for HISTOGRAMs with 1 DAY intervals (#52749) 2020-02-25 17:33:36 +02:00
Pius 4de0e6683f
Update ilm-settings.asciidoc (#51577) 2020-02-25 10:17:07 -05:00
bellengao ef88f77c45
[DOCS] Correct policy name in ILM docs example (#52354)
Updates an example snippet to use a consistent policy name.
2020-02-25 09:35:26 -05:00
David Pilato e51b8a51aa
[DOS] Fix typo in CSV processor docs (#52649)
Corrects an example array in a snippet of the CSV processor docs.
2020-02-25 08:47:58 -05:00
bellengao 21061f7479
[DOCS] Fix typo in ingest node docs (#52671) 2020-02-25 07:51:02 -05:00
James Rodewig ae1aafa302
[DOCS] Add admonition for app using cat APIs (#52727)
Adds an explicit "important" admonition discouraging apps from using
cat APIs.

cat APIs are intended for human consumption via the command line or
Kibana console only. They are not intended for consumption by
applications.
2020-02-25 07:19:35 -05:00
David Roberts ca80ad69f2
[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720)
This is because beat.timezone was renamed to event.timezone in
elastic/beats#9458
2020-02-25 12:18:53 +00:00
James Rodewig 12ed6f12e9
[DOCS] Document `include_in_*` nested mapping parms (#52648)
Adds documentation for the `include_in_parent` and `include_in_root`
mapping parameters for the `nested` mapping datatype.
2020-02-25 07:12:34 -05:00
Adrien Grand 93de946e60
Discourage from opting in for the `niofs` store. (#52638)
Indices open with the `niofs` store type load much more data on-heap than
indices open with the `mmapfs` store type. This limitation is now documented
and examples have been updated to show how to update settings to use the
`mmapfs` store type rather than `niofs`.
2020-02-25 08:52:53 +01:00
Adrien Grand b2ff78dec7
Clarify the resiliency trade-off of disabling replicas to speed up indexing. (#52714)
We should be more explicit about the downsides of disabling replicas and
explain that users should be ready to re-do the entire load in case of
issues mid-way.
2020-02-25 08:52:33 +01:00
Adrien Grand b30dbfe9a7
Document how CCR may be used to speed up indexing. (#52717)
One architecture that we have recommended to several users to speed up
indexing involved using CCR to prevent searching from stealing resources
from indexing.
2020-02-25 08:52:04 +01:00
Julie Tibshirani 19197ddde1
Correct the name of the search timeout parameter. (#52733)
The request body parameter is called 'timeout', not 'search_timeout'.
2020-02-24 14:57:49 -08:00
lcawl b590b49205 [DOCS] Adds anchor for custom rules 2020-02-24 10:04:34 -08:00
Andrei Stefan 928b11a34e
SQL: use a calendar interval for histograms over 1 month intervals (#52586) 2020-02-24 18:02:34 +02:00
Mayya Sharipova 556ee9a719
Correct boost calculation in script_score query (#52478)
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.

Closes #48465
2020-02-24 10:46:33 -05:00
James Rodewig 841d961b58
[DOCS] Document CCS-supported APIs (#52708)
Explicitly notes the Elasticsearch API endpoints that support CCS.

This should deter users from attempting to use CCS with other API
endpoints, such as `GET <index>/_doc/<_id>`.
2020-02-24 09:54:33 -05:00
James Rodewig 7f1d05c453
[DOCS] Correct multi search API docs (#52523)
* Adds an example request to the top of the page.
* Relocates several parameters erroneously listed under "Request body"
to the appropriate "Query parameters" section.
* Updates the "Request body" section to better document the NDJSON
  structure of msearch requests.
2020-02-24 07:41:53 -05:00
Ignacio Vera 3b004b1936
Add support for multipoint shape queries (#52564) 2020-02-24 12:42:17 +01:00
Marios Trivyzas 2eb986488a
[Docs] Clarify default value for `allow_no_indices` (#52635)
Add default value to each one of the usages of `allow_no_indices`
since it differs between different APIs.

Relates to: #52534
2020-02-24 11:37:29 +01:00
Benjamin Trent 20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
Nik Richers 576bcf10f7
[DOCS] Switch to standard ESS trial links (#52552)
Switches ESS trial sign-up links over to a standard attribute. This provides better metrics for how effective these links are.
2020-02-21 12:04:39 -05:00
James Rodewig 8c50523d9d
[DOCS] Add missing `indices` parms returned by `_nodes/stats` (#52055)
Adds several human-readable `indices` parameters returned by the
`_nodes/stats` API.
2020-02-21 08:07:28 -05:00
Andrei Stefan 477b0eda83
SQL: specify command to run the CLI on a remote machine without Elasticsearch (#52626) 2020-02-21 13:26:31 +02:00
István Zoltán Szabó 14555ca01e
[DOCS] Links transforms in aggregation docs (#52563)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-21 08:22:04 +01:00
Lisa Cawley 0ff9c6eef1
[DOCS] Adds X-Pack usage API (#52496) 2020-02-20 09:25:57 -08:00
Ignacio Vera e2b410e15e
Add support for multipoint geoshape queries (#52133)
Currently multi-point queries are not supported when indexing your data using BKD-backed geoshape strategy. This commit removes this limitation.
2020-02-20 08:53:01 +01:00
Russ Cam 94f6f946ef
Specify name on enrich.get_policy as list type (#50217)
This commit updates the enrich.get_policy API to specify name
as a list, in line with other URL parts that accept a comma-separated
list of values. 

In addition, update the get enrich policy API docs
to align the URL part name in the documentation with
the name used in the REST API specs.
2020-02-20 12:33:06 +11:00
Lee Hinman 18c98ea759
Correct SLM retention timezone documentation (#52533)
This erroneously said that retention is run in the master node's timezone, however, it is actually
run in UTC.
2020-02-19 13:45:32 -07:00