Users often mistakenly map numeric IDs to numeric datatypes. However,
this is often slow for the `term` and other term-level queries.
The "Tune for search speed" docs includes advice for mapping numeric
IDs to `keyword` fields. However, this tip is not included in the
`numeric` or `keyword` field datatype doc pages.
This rewords the tip in the "Tune for search speed" docs, relocates it
to the `numeric` field docs, and reuses it using tagged regions.
Co-authored-by: Daniel Huang <danielhuang@tencent.com>
This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification.
In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue.
The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases:
* Multi-valued source, in such case early termination is not possible.
* missing_bucket is set to true
Lucene 8.4 added support for "CONTAINS", therefore in this commit those
changes are integrated in Elasticsearch. This commit contains as well a
bug fix when querying with a geometry collection with "DISJOINT" relation.
Remote cluster stats API currently only returns useful information if
the strategy in use is the SNIFF mode. This PR modifies the API to
provide relevant information if the user is in the SIMPLE mode. This
information is the configured addresses, max socket connections, and
open socket connections.
* [DOCS] Document JVM node stats
Documents the `jvm` parameters returned by the `_nodes/stats` API.
Co-Authored-By: James Baiera <james.baiera@gmail.com>
The example snippets in the percentile rank agg docs use a test dataset
named `latency`, which is generated from docs/gradle.build.
At some point the dataset and example snippets were updated, but the
text surrounding the snippets was not. This means the text and the
example snippets shown no longer match up.
This corrects that by changing the snippets using /TESTRESPONSE magic comments.
Co-Authored-By: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
Co-Authored-By: Ed Savage <32410745+edsavage@users.noreply.github.com>
* CSV Processor for Ingest
This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.
Closes#49113
Removes a reference to shadow replicas from the cat shards API docs
and a comment in cluster/routing/UnassignedInfo.java.
Shadow replicas were removed with #23906.
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.
The current snippets in the synced flush docs can cause conflicts with
other background syncs, such as the global checkpoint sync or retention
lease sync, in the docs tests.
This skips tests for those snippets to avoid conflicts.
In the shape query docs, the index mapping snippet uses the "geometry"
shape field mapping. However, the doc index snippet uses the "location"
property.
This changes the "location" property to "geometry". It also adds a
comment containing the search result snippet. This should prevent
similar issues in the future.
This commit changes the recommended repository file for rpm based
systems to be disabled by default. This is a safer practice so upgrades
of the system do no accidentally upgrade elasticsearch itself.
closes#30660
* Allow list of IPs in geoip ingest processor
This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.
Closes#46193
Adds documentation for the `minimum_should_match` parameter to the `bool` query docs. Includes docs for the default values:
- `1` if the `bool` query includes at least one `should` clause and no `must` or `filter` clauses
- `0` otherwise
* Adds a title abbreviation
* Updates the description and adds a Lucene link
* Reformats the parameters section
* Adds analyze, custom analyzer, and custom filter snippets
Relates to #44726.
The documentation contained a small error, as bytes and duration was not
properly converted to a number and thus remained a string.
The documentation is now also properly tested by providing a full blown
simulate pipeline example.
* Creates a prerequisites section in the cross-cluster replication (CCR)
overview.
* Adds concise definitions for local and remote cluster in a CCR context.
* Documents that the ES version of the local cluster must be the same
or a newer compatible version as the remote cluster.
When the enrich processor appends enrich data to an incoming document,
it adds a `target_field` to contain the enrich data.
This `target_field` contains both the `match_field` AND `enrich_fields`
specified in the enrich policy.
Previously, this was reflected in the documented example but not
explicitly stated. This adds several explicit statements to the docs.
The "Restore any snapshots as required" step is a trap: it's somewhere between
tricky and impossible to restore multiple clusters into a single one.
Also add a note about configuring discovery during a rolling upgrade to
proscribe any rare cases where you might accidentally autobootstrap during the
upgrade.
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.
It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.
Related to #47567
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.
Closes#49531