Commit Graph

874 Commits

Author SHA1 Message Date
Julie Tibshirani c0c53b8724
Small corrections to stored_fields docs. (#53247)
* Fix a reference to the 'field' option.
* Remove claim about detecting script fields.
* Specify that object fields will just be ignored.
2020-03-09 10:57:23 -07:00
James Rodewig fe08296c31
[DOCS] Correct `hits.total.relation` response parm def (#52847)
Fixes a partially completed definition for the `hits.total.relation`
response parameter in the search API docs.
2020-03-04 08:22:40 -05:00
Josh Devins 4ff5e03c70
Adds recall@k metric to rank eval API (#52577)
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: https://github.com/elastic/elasticsearch/issues/51676
2020-02-27 10:43:42 +01:00
James Rodewig 7f1d05c453
[DOCS] Correct multi search API docs (#52523)
* Adds an example request to the top of the page.
* Relocates several parameters erroneously listed under "Request body"
to the appropriate "Query parameters" section.
* Updates the "Request body" section to better document the NDJSON
  structure of msearch requests.
2020-02-24 07:41:53 -05:00
Marios Trivyzas 2eb986488a
[Docs] Clarify default value for `allow_no_indices` (#52635)
Add default value to each one of the usages of `allow_no_indices`
since it differs between different APIs.

Relates to: #52534
2020-02-24 11:37:29 +01:00
debadair c93b8b91c3
[DOCS] Fixed typo. (#52071) 2020-02-07 11:03:56 -08:00
Jess 97b12c11db [Docs] Small edits to Ranking Evaluation API docs (#51116)
Small updates to grammar, syntax, and unclear wordings.
2020-01-20 10:30:54 +01:00
James Rodewig cfddddda0b
[DOCS] Fix search request body links (#50500)
PR #44238 changed several links related to the Elasticsearch search request body API. This updates several places still using outdated links or anchors.

This will ultimately let us remove some redirects related to those link changes.
2019-12-26 14:20:51 -05:00
Xiang Dai 432bd0e92c Fix docs typos (#50365)
Fixes a few typos in the docs.

Signed-off-by: Xiang Dai 764524258@qq.com
2019-12-23 10:35:14 -05:00
James Rodewig a311018fbc
[DOCS] Remove outdated file scripts refererence (#50437)
File scripts were removed in 6.0 with #24627.

This removes an outdated file scripts reference from the conditional clauses section of the search templates docs.
2019-12-20 14:02:42 -05:00
Adrien Grand 2d627ba757
Add per-field metadata. (#49419)
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.

In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
 - keys can't be longer than 20 chars,
 - values can only be numbers or strings of no more than 50 chars - no inner
   arrays or objects,
 - the metadata can't have more than 5 keys in total.

Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.

Here is how the meta might look like in mappings:

```json
{
  "properties": {
    "latency": {
      "type": "long",
      "meta": {
        "unit": "ms"
      }
    }
  }
}
```

And then in the field capabilities response:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms" ]
      }
    }
  }
}
```

When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms", "ns" ]
      }
    }
  }
}
```

Closes #33267
2019-12-18 17:27:38 +01:00
Adrien Grand 1329acc094
Upgrade to lucene 8.4.0-snapshot-662c455. (#50016)
Lucene 8.4 is about to be released so we should check it doesn't cause problems
with Elasticsearch.
2019-12-10 17:09:36 +01:00
Mayya Sharipova fa8b48deef Optimize sort on numeric long and date fields.
This rewrites long sort as a `DistanceFeatureQuery`, which can
efficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.

The optimization can be disabled with setting the system property
`es.search.rewrite_sort` to `false`.

Optimization is skipped when an index has 50% or more data with
the same value.

Optimization is done through:
1. Rewriting sort as `DistanceFeatureQuery` which can
efficiently skip non-competitive blocks and segments of documents.

2. Sorting segments according to the primary numeric sort field(#44021)
This allows to skip non-competitive segments.

3. Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.

4. Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.

5. When index is force merged to a single segment, #48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.

Closes #37043

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
2019-11-26 09:24:25 -05:00
Mayya Sharipova e9ba252176 Revert "Optimize sort on long field (#48804)"
This reverts commit 79d9b365c4.
2019-11-26 09:23:27 -05:00
Mayya Sharipova 79d9b365c4
Optimize sort on long field (#48804)
* Optimize sort on numeric long and date fields (#39770)

Optimize sort on numeric long and date fields, when 
the system property `es.search.long_sort_optimized` is true.

* Skip optimization if the index has duplicate data (#43121)

Skip sort optimization if the index has 50% or more data
with the same value.
When index has a lot of docs with the same value, sort
optimization doesn't make sense, as DistanceFeatureQuery
will produce same scores for these docs, and Lucene
will use the second sort to tie-break. This could be slower
than usual sorting.

* Sort leaves on search according to the primary numeric sort field (#44021)

This change pre-sort the index reader leaves (segment) prior to search
when the primary sort is a numeric field eligible to the distance feature
optimization. It also adds a tie breaker on `_doc` to the rewritten sort
in order to bypass the fact that leaves will be collected in a random order.
I ran this patch on the http_logs benchmark and the results are very promising:

```
|                                       50th percentile latency | desc_sort_timestamp |    220.706 |      136544 |   136324 |     ms |
|                                       90th percentile latency | desc_sort_timestamp |    244.847 |      162084 |   161839 |     ms |
|                                       99th percentile latency | desc_sort_timestamp |    316.627 |      172005 |   171688 |     ms |
|                                      100th percentile latency | desc_sort_timestamp |    335.306 |      173325 |   172989 |     ms |
|                                  50th percentile service time | desc_sort_timestamp |    218.369 |     1968.11 |  1749.74 |     ms |
|                                  90th percentile service time | desc_sort_timestamp |    244.182 |      2447.2 |  2203.02 |     ms |
|                                  99th percentile service time | desc_sort_timestamp |    313.176 |     2950.85 |  2637.67 |     ms |
|                                 100th percentile service time | desc_sort_timestamp |    332.924 |     2959.38 |  2626.45 |     ms |
|                                                    error rate | desc_sort_timestamp |          0 |           0 |        0 |      % |
|                                                Min Throughput |  asc_sort_timestamp |   0.801824 |    0.800855 | -0.00097 |  ops/s |
|                                             Median Throughput |  asc_sort_timestamp |   0.802595 |    0.801104 | -0.00149 |  ops/s |
|                                                Max Throughput |  asc_sort_timestamp |   0.803282 |    0.801351 | -0.00193 |  ops/s |
|                                       50th percentile latency |  asc_sort_timestamp |    220.761 |     824.098 |  603.336 |     ms |
|                                       90th percentile latency |  asc_sort_timestamp |    251.741 |     853.984 |  602.243 |     ms |
|                                       99th percentile latency |  asc_sort_timestamp |    368.761 |     893.943 |  525.182 |     ms |
|                                      100th percentile latency |  asc_sort_timestamp |    431.042 |      908.85 |  477.808 |     ms |
|                                  50th percentile service time |  asc_sort_timestamp |    218.547 |     820.757 |  602.211 |     ms |
|                                  90th percentile service time |  asc_sort_timestamp |    249.578 |     849.886 |  600.308 |     ms |
|                                  99th percentile service time |  asc_sort_timestamp |    366.317 |     888.894 |  522.577 |     ms |
|                                 100th percentile service time |  asc_sort_timestamp |    430.952 |     908.401 |   477.45 |     ms |
|                                                    error rate |  asc_sort_timestamp |          0 |           0 |        0 |      % |
```

So roughly 10x faster for the descending sort and 2-3x faster in the ascending case. Note
that I indexed the http_logs with a single client in order to simulate real time-based indices
where document are indexed in their timestamp order.

Relates #37043

* Remove nested collector in docs response

As we don't use cancellableCollector anymore, it should be removed from
the expected docs response.

* Use collector manager for search when necessary (#45829)

When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
Thus for such a case, we use collectorManager,
where for every segment a dedicated collector will be created.

* Use shared TopFieldCollector manager

Use shared TopFieldCollector manager for sort optimization.
This collector manager is able to exchange minimum competitive
score between collectors

* Correct calculation of avg value to avoid overflow

* Optimize calculating if index has duplicate data
2019-11-26 09:07:39 -05:00
James Rodewig 1e45db49ec
[DOCS] Document `script_score` float precision limit (#49402)
All document scores are positive 32-bit floating point numbers. However, this
wasn't previously documented.

This can result in surprising behavior, such as precision loss, for users when
customizing scores using the function score query.

This commit updates an existing admonition in the function score query docs to
document the 32-bits precision limit. It also updates the search API reference
docs to note that `_score` is a 32-bit float.
2019-11-21 08:53:56 -05:00
Orhan Toy 53b1bc3933 [Docs] Fix _count HTTP method (#48979) 2019-11-12 15:44:57 +01:00
Patrick Maynard 1ab63cc0d7 [DOCS] Fix typo in search type docs (#48868) 2019-11-11 09:39:46 -05:00
Christoph Büscher 51f89a7184
Remove Ranking Evaluation API experimental status (#48603)
The API has been released long enough to remove the experimental status.
2019-10-29 20:55:48 +01:00
Ian Danforth 6717343b47 [Docs] Fix typo in suggesters search API doc (#48477) 2019-10-29 09:57:17 +01:00
James Rodewig e7e45c5c20
[DOCS] Fix note format in index suggestion docs (#48536) 2019-10-25 10:30:52 -05:00
Christoph Büscher a1ae813410
[Docs] Mention reserved completion suggestion characters (#48445)
We currently don't mention the three reserved characters anywhere. This change
adds a short note mentioning them

Closes #48341
2019-10-25 16:57:51 +02:00
James Rodewig f53eba024b
[DOCS] Remove binary gendered language (#48362) 2019-10-23 09:36:31 -05:00
Jim Ferenczi 8f9e77e6f1
Fix tag in the search request timeout option docs (#47776)
and add missing parentheses `search_timeout` param
2019-10-10 10:35:09 +02:00
James Rodewig e7ffacf8c0
[DOCS] Correct callouts in search template docs (#47655) 2019-10-07 09:25:03 -04:00
James Rodewig 2fd051497e
[DOCS] Add response body parms to search API docs (#47042) 2019-09-30 11:41:14 -04:00
István Zoltán Szabó d0faf354c6
[DOCS] Reformats Profile API (#47168)
* [DOCS] Reformats Profile API.

* [DOCS] Fixes failing docs test.
2019-09-27 10:34:30 +02:00
István Zoltán Szabó 36502b2460
[DOCS] Reformats ranking evaluation API (#46974)
* [DOCS] Reformats ranking evaluation API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-25 14:55:09 +02:00
István Zoltán Szabó 69422b97cf
[DOCS] Reformat suggesters page. (#47010) 2019-09-25 14:38:47 +02:00
Alan Woodward c1f99e2d75
Remove `_type` from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
Alan Woodward b733f9e803
Remove types from explain API (#46926)
We no longer need a type to get the source of a document, so we can remove it from
the explain API as well.

Relates to #41059
2019-09-23 17:55:09 +01:00
István Zoltán Szabó 5dc4dc6e2e
[DOCS] Reformats Field capabilities API (#46866)
* [DOCS] Reformats Field capabilities API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-20 11:24:44 +02:00
István Zoltán Szabó b256462bef
[DOCS] Reformats explain API (#46857)
* [DOCS] Reformats explain API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-20 10:59:11 +02:00
James Rodewig 9f65af989d
[DOCS] Remove `lowercase_terms` parm from term suggester docs (#46879) 2019-09-19 15:56:24 -04:00
Takumasa Ochi 8b764a5209 Fix typos in `match` in profile API (#46723)
* Replace `matches` with correct `match`
* Use present tense consistently
* Replace `metric` with correct `match`
2019-09-19 16:05:46 +02:00
István Zoltán Szabó e0b19a8ae0
[DOCS] Reformats validate API (#46389)
* [DOCS] Reformats validate API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-18 14:29:48 +02:00
István Zoltán Szabó 4e11a19371
[DOCS] Reformats count API (#46377)
* [DOCS] Reformats count API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-17 09:53:03 +02:00
James Rodewig 5c78f606c2
[DOCS] Change // CONSOLE comments to [source,console] (#46440) 2019-09-09 10:45:37 -04:00
James Rodewig e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
James Rodewig 466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
James Rodewig f5827ba0ae
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159) 2019-09-04 12:51:02 -04:00
István Zoltán Szabó 4a0713aa0b
[DOCS] Reformats search template and multi search template APIs (#46236)
* [DOCS] Reformats search template and multi search template APIs.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-04 15:12:49 +02:00
István Zoltán Szabó ded27911dd
[DOCS] Reformats search shards API (#46240)
* [DOCS] Reformats search shards API
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-04 11:34:30 +02:00
István Zoltán Szabó c5c033cc1f
[DOCS] Reformats request body search API (#46254)
* [DOCS] Reformats request body search API.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-04 10:52:17 +02:00
István Zoltán Szabó f6466f4840
[DOCS] Reformats multi search API (#46256)
* [DOCS] Reformats multi search API.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-04 10:14:30 +02:00
István Zoltán Szabó a6e915b05a
[DOCS] Reformats URI search request (#45844)
* [DOCS] Reformats URI search request.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

Co-Authored-By: debadair <debadair@elastic.co>
2019-08-29 10:04:49 +02:00
James Rodewig 46d7849032
Change `{var}` convention to `<var>` (#45904) 2019-08-23 10:57:20 -04:00
Nathan Howard df51be533f Adding a warning to from-size.asciidoc
Customers occasionally discover a known behavior in Elasticsearch's pagination that does not appear to be documented. This warning is intended to educate customers of this behavior while still highlighting alternative solutions.
2019-08-22 19:07:14 -07:00
István Zoltán Szabó 912d740802
[DOCS] Reformats search API (#45786)
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-08-22 15:04:20 +02:00
James Rodewig 26323f0db3
[DOCS] Add template docs to scripts. Reorder template examples. (#45817)
* [DOCS] Add template docs to scripts. Reorder template examples.

* Adds a 'Search template' section to the 'How to use scripts' chapter.
  This links to the 'Search template' chapter for detailed info and
  examples.

* Reorders and retitles several examples in the 'Search template'
  chapter. This is primarily to make examples for storing, deleting, and
  using search templates more prominent.

* Change <templatename> to <templateid>
2019-08-22 08:40:09 -04:00