Commit Graph

1120 Commits

Author SHA1 Message Date
Sohail Mirza 9117f0e42a
Docs: Remove extraneous backtick (#86750) 2022-05-16 10:49:22 +02:00
Nik Everett a589456b81
Synthetic source (#85649)
This attempts to shrink the index by implementing a "synthetic _source" field.
You configure it by in the mapping:
```
{
  "mappings": {
    "_source": {
      "synthetic": true
    }
  }
}
```

And we just stop storing the `_source` field - kind of. When you go to access
the `_source` we regenerate it on the fly by loading doc values. Doc values
don't preserve the original structure of the source you sent so we have to
make some educated guesses. And we have a rule: the source we generate would
result in the same index if you sent it back to us. That way you can use it
for things like `_reindex`.

Fetching the `_source` from doc values does slow down loading somewhat. See
numbers further down.

## Supported fields
This only works for the following fields:
* `boolean`
* `byte`
* `date`
* `double`
* `float`
* `geo_point` (with precision loss)
* `half_float`
* `integer`
* `ip`
* `keyword`
* `long`
* `scaled_float`
* `short`
* `text` (when there is a `keyword` sub-field that is compatible with this feature)


## Educated guesses

The synthetic source generator makes `_source` fields that are:
* sorted alphabetically
* as "objecty" as possible
* pushes all arrays to the "leaf" fields
* sorts most array values
* removes duplicate text and keyword values

These are mostly artifacts of how doc values are stored.

### sorted alphabetically
```
{
  "b": 1,
  "c": 2,
  "a": 3
}
```
becomes
```
{
  "a": 3,
  "b": 1,
  "c": 2
}
```

### as "objecty" as possible
```
{
  "a.b": "foo"
}
```
becomes
```
{
  "a": {
    "b": "foo"
  }
}
```

### pushes all arrays to the "leaf" fields
```
{
  "a": [
    {
      "b": "foo",
      "c": "bar"
    },
    {
      "c": "bort"
    },
    {
      "b": "snort"
    }
}
```
becomes
```
{
  "a" {
    "b": ["foo", "snort"],
    "c": ["bar", "bort"]
  }
}
```

### sorts most array values
```
{
  "a": [2, 3, 1]
}
```
becomes
```
{
  "a": [1, 2, 3]
}
```

### removes duplicate text and keyword values
```
{
  "a": ["bar", "baz", "baz", "baz", "foo", "foo"]
}
```
becomes
```
{
  "a": ["bar", "baz", "foo"]
}
```
## `_recovery_source`

Elasticsearch's shard "recovery" process needs `_source` *sometimes*. So does
cross cluster replication. If you disable source or filter it somehow we store
a `_recovery_source` field for as long as the recovery process might need it.
When everything is running smoothly that's generally a few seconds or minutes.
Then the fields is removed on merge. This synthetic source feature continues
to produce `_recovery_source` and relies on it for recovery. It's *possible*
to synthesize `_source` during recovery but we don't do it.

That means that synethic source doesn't speed up writing the index. But in the
future we might be able to turn this on to trade writing less data at index
time for slower recovery and cross cluster replication. That's an area of
future improvement.

## perf numbers

I loaded the entire tsdb data set with this change and the size:

```
           standard -> synthetic
store size  31.0 GB ->  7.0 GB  (77.5% reduction)
_source  24695.7 MB -> 47.6 MB  (99.8% reduction - synthetic is in _recovery_source)
```

A second _forcemerge a few minutes after rally finishes should removes the
remaining 47.6MB of _recovery_source.

With this fetching source for 1,000 documents seems to take about 500ms. I
spot checked a lot of different areas and haven't seen any different hit. I
*expect* this performance impact is based on the number of doc values fields
in the index and how sparse they are.
2022-05-10 07:46:58 -04:00
Julie Tibshirani 10aa947707 Remove out-of-date note about kNN with filters
We implemented this in #84734 but forgot to update these docs.
2022-04-14 10:18:07 -07:00
Yannick Welsch 78789e2b5d
Fix wildcard highlighting on match_only_text (#85500)
Fixes a bug where match_only_text fields were ignored during highlighting when a field name with wildcard was specified.

Closes #85493
2022-04-01 08:12:08 +02:00
Craig Taverner 0b84eb1a53
Added buffer to vector tile REST API docs (#85460) 2022-03-30 14:29:01 +02:00
Alan Woodward a5452603cc
Extra testing and some cleanups for filtering on field caps (#85068)
* adds a test for mixed cluster requests
* fixes a bad stream version check (above test will fail if this isn't included)
* replaces private FieldCapsFilter interface with Predicate
* renames 'allowedTypes' to 'types' to maintain consistency with external API
* adds javadoc to ResponseRewriter
* removes isRuntimeField from FieldTypeLookup

Relates to #83636
2022-03-29 11:38:52 +01:00
Ignacio Vera a780558e4c
[DOCS] Fix Vector tiles search docs for features.id (#85067)
Removes the `features.id` property from the response body. This property was actually generated by the tool used to decode the mvt file to JSON.
2022-03-17 16:06:49 -04:00
Ignacio Vera 3f6d460d01
Integrate GeoHexGridAggregation with vector tiles API (#84553)
This commit adds a new optional parameter on the vector tiles API called `grid_agg` with two
possible values, geotile (default) and geohex. This will allow to build the aggs layer using different
grid aggregations, for example we can have a grid aggregation that is built using hexagons.
2022-03-16 11:16:30 +01:00
Julie Tibshirani 15708d5454
Integrate filtering support for ANN (#84734)
This PR integrates support for ANN with filtering added in Lucene 9.1. It adds
a new `filter` section to the `_knn_search` endpoint, which accepts a query (in
the Elasticsearch query DSL). The value can either be a single query or a list
of queries, which matches the syntax we use for defining filter clauses in a
`bool` query.

Closes #81788.
2022-03-10 15:53:51 -08:00
Craig Taverner 397eccf789
Added buffer pixels to vector tile spec parsing (#84710)
* Added buffer pixels to vector tile spec parsing

Previously this was hard-coded to 5, but now is configurable using the
format z/x/y@extent:buffer, where both extent and buffer are optional
and default to 4096 and 5 pixels respectively.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2022-03-10 16:42:29 +01:00
Julie Tibshirani 713017f0e3
Improve readability of field retrieval docs (#84373)
* Collapse more specialized sections around nested fields, unmapped fields, and
  ignored values
* Move information on metadata fields to a 'note' and streamline it a bit

Closes #82983.
2022-02-28 09:52:39 -08:00
James Rodewig 6f5541a9d6
[DOCS] Update CCS forward compatibility docs (#84055)
Documents the following:

* FWC for CCS within the same major version.
* A local cluster running the last minor of a major can search a remote cluster running any minor in the following major.
* Only features that exist across all searched clusters are supported.
2022-02-28 08:18:04 -05:00
Julie Tibshirani d9ef39f7c2
Remove 'under development' note in suggester docs (#84366)
In the intro, we mention that parts of the feature are still under development.
This is not very helpful information for users, and could give the wrong
impression about its maturity.
2022-02-24 13:27:03 -08:00
Nhat Nguyen 86964c9752
Document partial search results with skip_unavailable (#84057)
This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests.

Relates to #33915

Closes #82407

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2022-02-23 10:04:52 -05:00
Nhat Nguyen 31d703f24c
Introduce lookup runtime fields (#82385)
This PR introduces the lookup runtime fields which are used to retrieve 
data from the related indices. The below search request enriches its
search hits with the location of each IP address from the `ip_location`
index.

```
POST logs/_search
{
  "runtime_mappings": {
    "location": {
      "type": "lookup",
      "lookup_index": "ip_location",
      "query_type": "term",
      "query_input_field": "ip",
      "query_target_field": "_id",
      "fetch_fields": [
        "country",
        "city"
      ]
    }
  },
  "fields": [
    "timestamp",
    "message",
    "location"
  ]
}
```

Response:

```
{
  "hits": {
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "fields": {
          "location": [
            {
              "city": [ "Montreal" ],
              "country": [ "Canada" ]
            }
          ],
          "message": [ "the first message" ]
        }
      }
    ]
  }
}
```
2022-02-22 21:36:19 -05:00
Alan Woodward 8bc46ad959
Add filtering to fieldcaps endpoint (#83636)
Many consumers of the field caps API need to do some post-processing of the
results before they can use them; for instance, Kibana would like to exclude
multifields from certain field selections, or would like to display only geo_point
fields in Maps. ML and QL consumers exclude nested fields in certain
circumstances. This post-processing is possible at the moment, but can be
hacky; and in all cases it involves sending the whole (possibly very large) field
caps response over the wire and then whittling it down in the client. It is also not
guaranteed to be accurate - runtime fields may be incorrectly classified as multifields,
for example.

This commit pushes filtering into elasticsearch itself, reducing the amount of data
that needs to be transported and ensuring better accuracy. The field caps API gets
two new parameters:

* filters - a comma-delimited list that may contain any combination of: `+metadata`,
  `-metadata`, `-nested`, `-parent`, `-multifield`
* types - a comma-delimited list of field types; only fields that have a type in this set
  will be returned

The API will make best-effort attempts to apply the filters post-hoc to responses from
older nodes, so this should still work in a mixed-cluster or cross-cluster situation.

Fixes #82966, #72174
2022-02-10 14:06:26 +00:00
James Rodewig 2f03112b5b
[DOCS] Synced with 8.0 stack upgrade changes (#83489) (#83596)
This moves the bulk of the upgrade information into the consolidated upgrade guide, but leaves the primary upgrade topic in place as a cross reference.

Relates to: https://github.com/elastic/stack-docs/pull/1970

Co-authored-by: gchaps <33642766+gchaps@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit f6473d71f9)

Co-authored-by: debadair <debadair@elastic.co>
2022-02-07 11:01:42 -05:00
James Rodewig 882bac8948 [DOCS] Fix CCS compatibility typo 2022-02-02 13:09:40 -05:00
Eric Beahan 540a40093c
[DOCS] Correct header syntax (#83275)
* correct header syntax

* Update docs/reference/search/search-your-data/retrieve-selected-fields.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-01-28 14:55:54 -05:00
Julie Tibshirani e7ba03e0a6
Add notes on indexing to kNN search guide (#83188)
This change adds a new 'indexing considerations' section that explains why index
calls can be slow and how force merge can help search latency.
2022-01-28 10:23:35 -08:00
Christoph Büscher 61e1b080dd
[Docs] Add supported _terms_enum field types (#83244)
The `_terms_enum` API currently supported keyword, constant_keyword and
flattened fields. This should be documented more clearly.
2022-01-28 12:47:12 +01:00
James Rodewig dfb9f6f18d
[DOCS] Document 8.0 BWC support for CCS (#80809)
As of 8.0, the compatibility window for cross-cluster search (CCS) to an earlier release will be one minor release. This updates the CCS docs and adds a related 8.0 breaking change.

Closes https://github.com/elastic/elasticsearch/issues/80782
2022-01-11 10:33:12 -05:00
James Rodewig 7142b47e69
[DOCS] Add prerequisites for CCS (#81782)
* Adds a prerequisites section covering remote cluster config, node roles, and security.
* Moves existing content about remote cluster config to the prereqs.
* Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes.

Closes https://github.com/elastic/elasticsearch/issues/72001
2022-01-10 09:17:44 -05:00
Bogdan Pintea 13a0e420a3
SQL: Add CCS SQL documentation (#81545)
This adds the documentation for CCS SQL.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2022-01-05 20:01:01 +01:00
Ignacio Vera 8c6ed1efc0
Remove experimental flag from geo field format mvt (#81721)
Small left over from 7.16 where mvt feature become GA
2021-12-14 15:21:05 +01:00
Julie Tibshirani 19eed47159
Improve kNN error message when index is disabled (#81561)
In order to perform a kNN search on a `dense_vector` field, it must have
`index: true` in its mapping. This commit clarifies the error message. Before
the message was confusing, because the user likely didn't touch the `index`
parameter and might not even be aware of it.

It adds a note to the docs clarifying that when coming from 7.x, you must
explicitly update `index: true` and reindex the vectors.

Relates to #78473.
2021-12-08 16:20:35 -08:00
Nhat Nguyen d0d91c690e
Handle partial search result with point in time (#81349)
Today, a search request with PIT would fail immediately if any 
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.

Relates #81256
2021-12-08 10:04:38 -05:00
James Rodewig 229d2d7a77
[DOCS] Add high-level guide for kNN search (#80857)
Adds a high-level guide for running an approximate or exact kNN search in Elasticsearch.

Relates to https://github.com/elastic/elasticsearch/issues/78473.
2021-11-30 14:17:39 -05:00
happybin92 0aa9767f3d
Support combining _shards preference param with <custom-string> (#80024)
Adds support for combining the _shards search preference parameter with the <custom-string> search preference parameter.

Closes #80021
2021-11-10 14:08:27 +01:00
Julie Tibshirani 8ca693b271
Add docs for kNN search endpoint (#80378)
This commit adds docs for the new `_knn_search` endpoint.

It focuses on being an API reference and is light on details in terms of how
exactly the kNN search works, and how the endpoint contrasts with
`script_score` queries. We plan to add a high-level guide on kNN search that
will explain this in depth.

Relates to #78473.
2021-11-09 09:28:12 -08:00
James Rodewig f56a0f4b66
[DOCS] Remove `testenv` annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00
Ignacio Vera 508ed02ed2
Document _key tag added on the agg layer features (#80205) 2021-11-03 07:12:46 +01:00
James Rodewig ee1f71d421
[DOCS] Add experimental label to TSDB mapping params and settings (#79647)
Adds an `experimental` annotation to the following:

* `time_series_metric` mapping parameter
* `time_series_dimension` mapping parameter
* `index.mapping.dimension_fields.limit` index setting
*  `time_series_dimension` and `time_series_metric` properties in the field caps API response
2021-10-27 09:09:54 -04:00
Dan Hermann 4a36d5cd79
Remove endpoint for freezing indices (#78918) 2021-10-26 06:37:56 -05:00
James Rodewig 7940e0777c
[DOCS] Re-add several query params to search API docs (#79716)
PR #55884 removed documentation for several query parameters from the search API
docs. During tests, I failed to notice that these are valid parameters but require other parameters to use.

Changes:

* Notes the following search API parameters require the `q` query string parameter:

  * `analyzer`
  * `analyze_wildcard`
  * `default_operator`
  * `df`
  * `lenient`

* Notes the following search API parameters require the `suggest_field` and `suggest_text` query parameters:

  * `suggest_mode`
  * `suggest_size`

* Re-adds the above parameters to the search API docs.

These changes also affect API documentation that reuses the search API parameters:

* Delete by query API
* Update by query API
* Count API
* Explain API
* Validate API

Closes #79674
2021-10-25 11:58:54 -04:00
Gilad Gal 5276ee9b8b Update search-vector-tile-api.asciidoc
Vector tiles are GA in 7.16
2021-10-20 09:21:51 -04:00
Christoph Büscher 8b56362dbf
[Docs] Retrieving metadata using the fields option (#79174)
Adding a small section to the field retrieval article about which metadata
fields now can be retrieved via the `fields` option.
2021-10-15 11:47:17 +02:00
Igor Motov f6034e643a
TSDB: Add time series information to field caps (#78790)
Exposes information about dimensions and metrics via field caps. This
information will be needed for PromQL support.

Relates to #74660
2021-10-13 11:03:38 -10:00
markharwood 0f9848e243
BWC change following backport of PR 78697 to 7.x (#79067)
BWC change following backport of PR 78697 to 7.x
Closes #74121
2021-10-13 15:18:25 +01:00
markharwood 228992bf7e
Search - return ignored field values from fields api. (#78697)
Since Kibana's Discover switched to retrieving values via the fields API rather than source there have been gaps in the display caused by "ignored" fields (those that fall foul of ignore_above and ignore_malformed size and formatting rules).

This PR returns ignored values from source when a user-requested field fails to be parsed for a document. In these cases the corresponding hit adds a new ignored_field_values section in the response.

Closes #74121
2021-10-13 11:05:17 +01:00
Ignacio Vera 920b3b52c2
Add support for metrics aggregations to mvt end point (#78614)
It adds support for several aggregations.
2021-10-05 09:17:25 +02:00
Ignacio Vera e4cde37111
Add centroid grid type in mvt request (#78305)
For this grid type, the features on the aggregation layer are represented by a point that is computed from the 
centroid of the data inside the cell

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-01 06:56:13 +02:00
Ignacio Vera 9033faffff
Add cross cluster search test for mvt end point (#78054)
This commit adds a test to check that it is supported and document it.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-09-23 07:59:44 +02:00
Adam Locke 6940673e8a
[DOCS] Update remote cluster docs (#77043)
* [DOCS] Update remote cluster docs

* Add files, rename files, write new stuff

* Plethora of changes

* Add test and update snippets

* Redirects, moved files, and test updates

* Moved file to x-pack for tests

* Remove older CCS page and add redirects

* Cleanup, link updates, and some rewrites

* Update image

* Incorporating user feedback and rewriting much of the remote clusters page

* More changes from review feedback

* Numerous updates, including request examples for CCS and Kibana

* More changes from review feedback

* Minor clarifications on security for remote clusters

* Incorporate review feedback

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Some review feedback and some editorial changes

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
2021-09-22 16:02:33 -04:00
Ignacio Vera 75b7b0db03
Add track_total_hits support in mvt API (#78074)
This allows consumers of the API to be able to know exactly if all the features in a tile has been considered 
when building the hits layer of a vector tile

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-09-22 08:37:50 +02:00
Ignacio Vera 2c98de99f6
Add documentation for _index property in mvt response (#78019)
Document that we are including a _index property for each hit in _mvt response.
2021-09-21 07:13:09 +02:00
edh-oss 62a471aefe
Update JSON parser and snippets (#77983)
Related to issue  #77823

This does the following:

- Updates several asciidoc files that contained code snippets with
  invalid JSON, most involving unnecessary trailing commas.

- Makes the switch from the Groovy JSON parser to the Jackson parser,
  pursuant to the general goal of eliminating Groovy dependence.

- Makes testing of JSON validity at build time more strict.

Note that this update still allows backslash escaping for any
character. Currently that matters because of the file
"docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc",
specifically this part:

    "attributes" : {
      "ml.machine_memory" :
        "$body.datafeeds.0.node.attributes.ml\.machine_memory",
      "ml.max_open_jobs" : "512"
    }

It's not clear to me what change, if any, is appropriate there. So,
I've left in the escaped period and configured the parser to ignore
it for the time being.
2021-09-20 11:08:26 +01:00
Nik Everett ed522ba26b
Disable bwc tests before backporting #77633 (#77639)
We make wire changes in #77633 so we need to disable the backwards
compatibility tests in master before merging the wire changes. We'll
re-enable them after the backport is merged.
2021-09-13 14:58:27 -04:00
Nik Everett c2c0165fd2
Profile the fetch phase (#77064)
This adds profiling to the fetch phase so we can tell when fetching is
slower than we'd like and we can tell which portion of the fetch is
slow. The output includes which stored fields were loaded, how long it
took to load stored fields, which fetch sub-phases were run, and how
long those fetch sub-phases took.

Closes #75892

* Skip bwc

* Don't compare fetch profiles

* Use passed one

* no npe

* Do last rename

* Move method down

* serialization tests

* Fix sneaky serialization

* Test for sneaky bug

* license header

* Document

* Fix test

* newline

* Restore assertion

* unit test merging

* Handle inner hits

* Fixup

* Revert unneeded

* Revert inner hits profiling

* Fix names

* Fixup names

* Move results building

* Drop loaded_nested

* Checkstyle

* Fixup more

* Finish writeable cleanup

Add unit tests for merge

* Remove null checking builder

* Fix wire mistake

How did this pass before?!

* Rename

* Remove funny builder

* Remove name munging
2021-09-13 10:00:36 -04:00
James Rodewig cfae69717a
[DOCS] Update anchor and add redirect for aliases (#77349)
PRs #73062 and #73043 repurposed the `alias` anchor for a new guide for index
and data stream aliases. Previously, this anchor was used for our field alias
documentation.

Repurposing the anchor has caused continuity errors for users selecting
different versions of the ES docs. It could also cause confusion for users with
a `/current/` link to the `alias` page.

This updates the anchor for the alias guide and adds a redirect page to
disambiguate the `alias` anchor.

It also fixes a bread crumb issue for redirects following the 'Modifying your
Data' redirect page.

Closes #77034.
2021-09-07 09:42:42 -04:00