elasticsearch

History

Adrien Grand 8d6a41f671 Nested queries should avoid adding unnecessary filters when possible. (#23079 ) When nested objects are present in the mappings, many queries get deoptimized due to the need to exclude documents that are not in the right space. For instance, a filter is applied to all queries that prevents them from matching non-root documents (`+: -_type:__`). Moreover, a filter is applied to all child queries of `nested` queries in order to make sure that the child query only matches child documents (`_type:__nested_path`), which is required by `ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested` queries). These additional filters slow down `nested` queries. In 1.7-, the cost was somehow amortized by the fact that we cached filters very aggressively. However, this has proven to be a significant source of slow downs since 2.0 for users of `nested` mappings and queries, see #20797. This change makes the filtering a bit smarter. For instance if the query is a `match_all` query, then we need to exclude nested docs. However, if the query is `foo: bar` then it may only match root documents since `foo` is a top-level field, so no additional filtering is required. Another improvement is to use a `FILTER` clause on all types rather than a `MUST_NOT` clause on all nested paths when possible since `FILTER` clauses are more efficient. Here are some examples of queries and how they get rewritten: ``` "match_all": {} ``` This query gets rewritten to `ConstantScore(+:* -_type:__)` on master and `ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})` with this change. The automaton is the complement of `_type:__` so it matches the same documents, but is faster since it is now a positive clause. Simplistic performance testing on a 10M index where each root document has 5 nested documents on average gave a latency of 420ms on master and 90ms with this change applied. ``` "term": { "foo": { "value": "0" } } ``` This query is rewritten to `+foo:0 #(ConstantScore(+: -_type:__))^0.0` on master and `foo:0` with this change: we do not need to filter nested docs out since the query cannot match nested docs. While doing performance testing in the same conditions as above, response times went from 250ms to 50ms. ``` "nested": { "path": "nested", "query": { "term": { "nested.foo": { "value": "0" } } } } ``` This query is rewritten to `+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+:* -_type:__))^0.0` on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The top-level filter (`-_type:__`) could be removed since `nested` queries only match documents of the parent space, as well as the child filter (`#_type:__nested`) since the child query may only match nested docs since the `nested` object has both `include_in_parent` and `include_in_root` set to `false`. While doing performance testing in the same conditions as above, response times went from 850ms to 270ms.		2017-02-14 16:05:19 +01:00
..
aggregations	Use `typed_keys` parameter to prefix suggester names by type in search responses (#23080 )	2017-02-10 10:53:38 +01:00
analysis	Consolify docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc. (#23050 )	2017-02-13 11:00:12 +01:00
cat	Fix duplicates from search.query (#22701 )	2017-01-20 18:45:10 +01:00
cluster	Docs: Consoleify cluster and indices settings docs (#23030 )	2017-02-10 14:57:43 -08:00
docs	Fixed bad asciidoc in delete-by-query	2017-02-09 20:14:56 +01:00
how-to	Improve wording in recipes docs	2017-01-17 21:00:36 -05:00
images	…
index-modules	…
indices	Docs: Consoleify cluster and indices settings docs (#23030 )	2017-02-10 14:57:43 -08:00
ingest	…
mapping	Disallow include_in_all for 6.0+ indices	2017-02-07 19:31:51 -07:00
migration	Disallow include_in_all for 6.0+ indices	2017-02-07 19:31:51 -07:00
modules	Add a note about `cluster.routing.allocation.node_concurrent_recoveries` (#23160 )	2017-02-14 14:14:41 +02:00
painless-api-reference	Expose multi-valued dates to scripts and document painless's date functions (#22875 )	2017-02-01 21:57:07 -05:00
query-dsl	Add note about min_score filtering efficiency (#23109 )	2017-02-13 12:15:01 +01:00
search	Nested queries should avoid adding unnecessary filters when possible. (#23079 )	2017-02-14 16:05:19 +01:00
setup	Adding `ansible-elasticsearch` to list of CM tools (#23058 )	2017-02-09 21:14:30 +01:00
testing	…
aggregations.asciidoc	…
analysis.asciidoc	…
api-conventions.asciidoc	Optionally require a valid content type for all rest requests with content (#22691 )	2017-02-02 14:07:13 -05:00
cat.asciidoc	…
cluster.asciidoc	…
docs.asciidoc	…
getting-started.asciidoc	Replaced absolute URLs in docs with attributes	2017-02-04 12:05:03 +01:00
glossary.asciidoc	…
how-to.asciidoc	Correct grammar in list in how-to docs	2017-01-17 20:57:22 -05:00
index-modules.asciidoc	Allow an index to be partitioned with custom routing (#22274 )	2017-01-18 08:51:23 +01:00
index.asciidoc	Centralised doc versions in docs/Versions.asciidoc	2017-02-04 11:16:19 +01:00
indices.asciidoc	…
ingest.asciidoc	…
mapping.asciidoc	Disallow include_in_all for 6.0+ indices	2017-02-07 19:31:51 -07:00
modules.asciidoc	Docs: Cross-cluster search doc wasn't being included	2017-01-18 10:02:51 +01:00
painless-api-reference.asciidoc	Generate reference links for painless API (#22775 )	2017-01-26 10:39:19 -05:00
query-dsl.asciidoc	…
redirects.asciidoc	Update redirects.asciidoc (#23148 )	2017-02-13 16:23:25 +01:00
release-notes.asciidoc	…
search.asciidoc	…
setup.asciidoc	Docs: Add setup section for the keystore tool and secure settings (#22838 )	2017-01-30 14:56:45 -08:00
testing.asciidoc	…