During highlighting, we now load all values that were copied into the field
through copy_to. So there's no longer a reason to set 'store: true' to account
for fields not available in _source.
In some cases when the rate aggregation is not a child of a date histogram
aggregation, it is not possible to determine the actual size of the date
histogram bucket. In this case the rate aggregation now throws an exception.
Closes#63703
Previously, geo_shape support was only mentioned in a dedicated x-pack
section. This may be misleading, as the introductory paragraph only
mentions geo_point.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adds the capability to have functions with two optional arguments
* Adds two new optional arguments to `PERCENTILE()` and
`PERCENTILE_RANK()` functions, namely the method and
method_parameter which can be: 1) `tdigest` and a double `compression`
parameter or 2) `hdr` and an integer representing the
`number_of_digits` parameter.
* Integration tests
* Documentation updates
Closes#63567
This PR adds detail to the explanation of the soft_limit
memory_status in ML job stats. A consequence that was not
mentioned before is that examples are not added to category
definitions.
Relates elastic/ml-cpp#1590
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.
#### specifics
A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.
the `sort` field would support any numeric field, including date.
#### sample usage
```
{
"query": {
"bool": {
"must": [
{ "term": { "person": "004" } },
{ "term": { "trajectory": "20090131002206.plt" } }
]
}
},
"aggs": {
"make_line": {
"geo_line": {
"point": {"field": "location"},
"sort": { "field": "timestamp" },
"include_sort": true,
"sort_order": "desc",
"size": 15
}
}
}
}
```
#### sample response
```
{
"took": 21,
"timed_out": false,
"_shards": {...},
"hits": {...},
"aggregations": {
"make_line": {
"type": "LineString",
"coordinates": [
[
121.52926194481552,
38.92878997139633
],
[
121.52922699227929,
38.92876998055726
],
]
}
}
}
```
#### visual response
<img width="540" alt="Screen Shot 2019-04-26 at 9 40 07 AM" src="https://user-images.githubusercontent.com/388837/56834977-cf278e00-6827-11e9-9c93-005ed48433cc.png">
#### limitations
Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.
One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore
closes#41649
The _Important Elasticsearch configuration_ docs lists a number of items
that you should consider before moving to production. Today this list
does not include configuring snapshots, even though they're very
important to have in production. This commit addresses that omission,
removes some repetition from the introductory paragraphs, and notes that
this config is handled for you on Cloud.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking
This change adds an extra piece of information,
limits.total_ml_memory, to the ML info response.
This returns the total amount of memory that ML
is permitted to use for native processes across
all ML nodes in the cluster. Some of this may
already be in use; the value returned is total,
not available ML memory.
Clarifies differences between the
`cluster.routing.allocation.total_shards_per_node` and
`cluster.max_shards_per_node` cluster settings.
Closes#51839
Co-authored-by: Gordon Brown <arcsech@gmail.com>
This new API provides a way for users to upgrade their own anomaly job
model snapshots.
To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
native process)
relates #64154
Include the attempted 'match_mapping_type' into the message,
so that it is clearer that multiple validation attempts have occurred.
Dynamic template validation was recently added via #51233 and
there was some confusion over the deprecation message itself.
(in 7.x only deprecation warning will be omitted and from 8.0
an error will be returned)
This adds support for the searchable_snapshot ILM action in the hot phase.
We define a series of actions that cannot be executed after the index has been
mounted as a searchable snapshot. Namely: freeze, forcemerge, shrink,
and searchable_snapshot (also available in the cold phase).
If by virtue of snapshot/restoring a managed index or updating an ILM policy while it
is executing for an index, these actions could get to be executed on an index that was
mounted as searchable snapshot in the hot phase. If this happens the actions will
skip entirely. ILM will not move into the ERROR step.
In the process of developing a new implementation for the Elasticsearch Rollups functionality we came up with the concept of the aggregate metric field type.
The aggregate_metric_double field type can store the results of aggregations (currently min, max, sum, value_count and avg are supported - more to come).
This field allows us to run (min, max, sum, value_count, avg) aggregations on the container field and the field will return the correct metric depending on the aggregation that is computed.
I'm not sure if this setting was left here deliberately? or by accident?
With all other node role definition has changed syntax from `node.xxx` to `node.roles: [ ]`, the ingest one is the only one left behind.
Added the capability to delete autoscaling policies by pattern, allowing
to for instance do:
```
DELETE _autoscaling/policy/*
```
to delete all autoscaling policies. If a wildcard is involved, no
matches are required.
* Remove constant_keyword from SQL docs
`constant_keyword` removed as distinct type from SQL in #60524.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Adds a limit to the maximum number of snapshots that are allowed
to be added to a snapshot repository as a safety measure of last resort
against repositories that grow to an unmanagable size due to e.g. incorrect SLM
settings.
Co-authored-by: David Turner <david.turner@elastic.co>
Bucket aggregations compute bucket doc_count values by incrementing the doc_count by 1 for every document collected in the bucket.
When using summary fields (such as aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (named doc_count field mapper). This field is a positive integer representing the number of documents aggregated in a single summary field.
Bucket aggregations will check if a field of type doc_count exists in a document and will take this value into consideration when computing doc counts.
This commit adds support data stream support to CCR's auto following by making the following changes:
* When the auto follow coordinator iterates over the candidate indices to follow,
the auto follow coordinator also checks whether the index is part of a data stream and
if the name of data stream also matches with the auto follow pattern then the index
will be auto followed.
* When following an index, the put follow api also checks whether that index is part
of a data stream and if so then also replicates the data stream definition to the
local cluster.
* In order for the follow index api to determine whether an index is part of a data
stream, the cluster state api was modified to also fetch the data stream definition
of the cluster state if only the state is queried for specific indices.
When a data stream is auto followed, only new backing indices are auto followed.
This is in line with how time based indices patterns are replicated today. This
means that the data stream isn't copied 1 to 1 into the local cluster. The local
cluster's data stream definition contains the same name, timestamp field and
generation, but the list of backing indices may be different (depending on when
a data stream was auto followed).
Closes#56259
Add a roles specification to autoscaling policies. This is used to map
the policy to a set of nodes governed by the policy. The list of roles
is mandatory when adding a policy, optional on updates.
This commit also removes the outer level "policy" element from autoscaling
policy PUT and GET requests.
Running the Elasticsearch Docker image with a different GID is
possible but trappy, since at present all the ES files are only
readable by the user and group. This PR documents a Docker CLI flag
that fixes this situation, by ensuring the container user is added
to the default group (which is `root`, GID 0).
I also added a test for this case, and refactored the Docker tests
to use a builder pattern for constructing the `docker run` command.
The existing code was becoming unwieldy and hard to change.
- Replaces more abstract docs about object structure and values source with task-based examples.
- Relocates several sections from the current `misc.asciidoc` file.
- Alphabetically sorts agg categories in the nav.
- Removes the matrix agg family. Moves the stats matrix agg under the metric agg family
Co-authored-by: debadair <debadair@elastic.co>
The docs for the geoip processor database_file option appear to indicate
that all geoip databases are in the config directory. This is leftover
legacy from when this was the case when ingest-geoip was a plugin, but
it is no longer true as the built-in databases now ship inside the
ingest-geoip module that is bundled by default. This commit clarifies
those docs.
Co-authored-by: Jakob Reiter <jakommo@users.noreply.github.com>
After #63811 it became clear to me that `postCollect` is kind of
dangerous and not all that useful. So this removes it.
The trouble with `postCollect` is that it all happened right after we
finished calling `collect` on the `LeafBucketCollectors` but before we
built the aggregation results. But in #63811 we found out that we can't
call `postCollect` on the children of `parent` or `child` aggregators
until we know which *which* aggregation results we're building.
So this removes `postCollect` and moves all of the things we did at
post-collect phase into `buildAggregations` or into hooks called in
those methods.
This commit clarifies that the preferred method for setting the heap
size is via jvm.options.d and that using the ES_JAVA_OPTS environment
variable is discouraged for production deployments.
This commit adjusts the defaults for the tiered data roles so that they
are enabled by default, or if the node has the legacy data role. This
ensures that the default experience is that the tiered data roles are
enabled.
To fully specifiy the behavior for the tiered data roles then:
- starting a new node with the defaults: enabled
- starting a new node with node.roles configured: enabled if and only
if the tiered data roles are explicitly configured, independently
of the node having the data role
- starting a new node with node.data enabled: enabled unless the
tiered data roles are explicitly disabled
- starting a new node with node.data disabled: disabled unless the
tiered data roles are explicitly enabled
Closes#20640.
This PR introduces a new parameter to v2 templates, `allow_auto_create`,
which allows templates to override the cluster setting `auto_create_index`.
Notes:
* `AutoCreateIndex` now looks for a matching v2 template, and if its
`allow_auto_create` setting is true, it overrides the usual logic.
* `TransportBulkAction` previously used `AutoCreateIndex` to check
whether missing indices should be created. We now rely on
`AutoCreateAction`, which was already differentiating between creating
indices and creating data streams. I've updated `AutoCreateAction` to
use `AutoCreateIndex`. Data streams are also influenced by
`allow_auto_create`, in that their default auto-create behaviour can
be disabled with this setting.
* Most of the Java file changes are due to introducing an extra
constructor parameter to `ComposableIndexTemplate`.
* I've added the new setting to various x-pack templates
* I added a YAML test to check that watches can be created even when
`auto_create_index` is `false`.
When running ML, sometimes it is best to automatically adjust the
memory allotted for machine learning based on the nodesize
and how much space is given to the JVM
This commit adds a new static setting xpack.ml.use_auto_machine_memory_percent for
allowing this dynamic calculation. The old setting remains as a backup
just in case the limit cannot be automatically determined due to
lack of information.
Closes#63795
* Allow mixing set-based and regexp-based include and exclude
* Coding style
* Disallow having both set and regexp include (resp. exclude)
* Test correctness of every combination of include/exclude
Renamed decision API to capacity. Responses now prefer objects/maps over
arrays. Removed mention of tier, using policies as the outer map and
total for the policy-wide total capacity.
This adds a new flag `exclude_generated` for GET transform API.
This flag is useful for when a transform needs to be cloned within a cluster or exported/imported between clusters.
It removes certain fields that are not able to be set via the PUT api (e.g. version, create_time).
relates https://github.com/elastic/elasticsearch/issues/63055
When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.
This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.
APIs supporting this flag:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
relates to #63055
Renames data frame analytics _evaluate API results as follows:
- per class accuracy renamed from `accuracy` to `value`
- per class precision renamed from `precision` to `value`
- per class recall renamed from `recall` to `value`
- auc_roc `score` renamed to `value` for both outlier detection and classification
The original comment mentioned issue #48583, but issue #48941
is specifically open for this mute. However, this is
inappropriate, as the underlying reason the test cannot be
unmuted is the same as for all the other tests skipped with the
comment "Kibana sample data": issues #51572, #51576 and #51678.
Closes#48941
We've identified two important enhancements that may affect the API. We expect
any API changes from these enhancements to be minor, but want to leave open the
possibility for small breaks. For example, we may end up returning unmapped
fields by default, or omitting nested fields from the root hit. The impact to
users should be quite small.
We're tracking the issues we need to resolve before removing the 'beta' label
here: #60985.
The original description of per-field boosting is incorrect. Boosting a
field does not imply that it is more important relative to other fields.
It simply means that the score is multiplied by the supplied boost
value. Due to the differences in each field's term and document
statistics, it's not possible to imply relative importance of fields
based on the per-field boost value alone.
add support for the missing (bucket) aggregation (counts docs with a configured missing field value)
in transform. The output is mapped to name:count, the mapping type is long.
The current _update_by_query documentation mentions a scroll_size default of 100 and later another default of 1000.
We use the default of 1000 defined in AbstractBulkByScrollRequest and this PR changes the documentation accordingly.
Closes#63637
For a query like `SELECT name FROM test WHERE name LIKE ''%c*'` ES SQL
generates an error. `*` is not a special character in a `LIKE` construct
and it's expected to not needing to be escaped, so the previous query
should work as is.
In the LIKE pattern any `*` character was treated as invalid character
and the usage of `%` or `_` was suggested instead. But `*` is a valid,
acceptable non-wildcard on the right side of the `LIKE` operator.
Fix: #55108
Closes#51670, closes#50838.
Introduce a tiny base image for Docker builds. It aims to create a basic filesystem with as little as possible, which is mostly glibc, busybox and bash. A statically-built curl is also provided.
We still use CentOS 8 as a base. All the fun stuff happens in the Dockerfile.
We standardize on some metadata entries that we plan to later leverage
in Kibana in order to provide a better out-of-the-box experience, e.g.
different visualizations make sense on gauges and counters.
This adds general overview documentation for data tiers,
the data tiers specific node roles, and their application in
ILM.
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: debadair <debadair@elastic.co>
The current link points to an obsolete site, which is no longer maintained.
Co-authored-by: Stefan Walter <67258699+rd-stefan-walter@users.noreply.github.com>
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.
Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:
- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`
Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
This fixes fields retrieval on unsigned_long field
1) For docvalue_fields a custom UnsignedLongLeafFieldData::getLeafValueFetcher
is implemented that correctly retrieves doc values.
2) For stored fields, an error was fixed in UnsignedLongFieldMapper
how stored values were stored. Before they were incorrectly
stored in the shifted format, now they are stored as original
values in String format.
Relates to #60050
Make EQL case sensitive by default and adapt some of the string functions
Remove the case sensitive option from Between string function
Add case_insensitive option to term and wildcard queries usage
This adds the new `for_export` flag to the following APIs:
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>
The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster.
The following fields are not returned in the objects:
- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)
closes https://github.com/elastic/elasticsearch/issues/63055
This commit adds telemetry for our data tier formalization. This telemetry helps determine the
topology of the cluster with regard to the content, hot, warm, & cold tiers/roles.
An example of the telemetry looks like:
```
GET /_xpack/usage?human
{
...
"data_tiers" : {
"available" : true,
"enabled" : true,
"data_warm" : {
...
},
"data_cold" : {
...
},
"data_content" : {
"node_count" : 1,
"index_count" : 6,
"total_shard_count" : 6,
"primary_shard_count" : 6,
"doc_count" : 71,
"total_size" : "59.6kb",
"total_size_bytes" : 61110,
"primary_size" : "59.6kb",
"primary_size_bytes" : 61110,
"primary_shard_size_avg" : "9.9kb",
"primary_shard_size_avg_bytes" : 10185,
"primary_shard_size_median" : "8kb",
"primary_shard_size_median_bytes" : 8254,
"primary_shard_size_mad" : "7.2kb",
"primary_shard_size_mad_bytes" : 7391
},
"data_hot" : {
...
}
}
}
```
The fields are as follows:
- node_count :: number of nodes with this tier/role
- index_count :: number of indices on this tier
- total_shard_count :: total number of shards for all nodes in this tier
- primary_shard_count :: number of primary shards for all nodes in this tier
- doc_count :: number of documents for all nodes in this tier
- total_size_bytes :: total number of bytes for all shards for all nodes in this tier
- primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
- primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
- primary_shard_size_median_bytes :: median shard size for primary shard in this tier
- primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier
Relates to #60848
We don't need a special TypeFieldMapper for anything in particular; all access
to the type field can be done via a TypeFieldType that issues appropriate
deprecation warnings.
Relates to #41059
* [DOCS] Adds limitation item about using scripts in transforms.
* [DOCS] Adds scripts related limitation item to transforms docs.
* [DOCS] Merges two bullets, adds a new one, and removes last sentences.
* [DOCS] Refines last bullet.
* [DOCS] Addresses feedback.
* [DOCS] Removes low info content.
We support `"""` in `console` snippets to emulate kibana's CONSOLE.
CONSOLE also spits out `"""` when a json field contains a new line or a
double quote. This adds support for those sorts of responses to the
handling of `console-response` snippets.
Revises the current 'How to avoid oversharding' docs to incorporate
information from our [shard sizing blog post][0].
Changes:
* Streamlines introduction
* Adds "Things to remember" section to describe how shards work
* Adds "Guidelines" section based on blog tips
* Creates a "Fix an oversharded cluster" section
[0]: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
This adds the network property from the MaxMind Geo ASN database.
This enables analysis of IP data based on the subnets that MaxMind have
previously identified for ASN networks.
closes#60942
If `track_total_hits=true` is used, the exact value of the number of hits is returned - i.e. the value is effectively limitless, and not the default value of 10,000
Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Closes#32434
Follow up to #62623, this commit removes support in 8x for index-time boosts.
There is no longer a boost field on MappedFieldType. Indexes created in 8x
and after will throw exceptions if a boost parameter is included in mappings,
and indexes created in 7x will emit warnings.
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.
Closes: #62650
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.
Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.
Relates to #48878
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:
```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```
If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.
This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.
Subsequent work will change the ILM migration to make additional use of this setting.
Relates to #60848
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.
The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
We removed index-time boosting back in 5x, and we no longer document the 'boost'
parameter on any of our mapping types. However, it is still possible to define an
index-time boost on a field mapper for a surprisingly large number of field types, and
they even have an effect (sometimes, on some queries).
As a first step in finally removing all traces of index time boosting, this comment emits
a deprecation warning whenever a boost parameter is found on a mapping definition.
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
The underlying issue was fixed a while ago in Lucene:
https://issues.apache.org/jira/browse/LUCENE-9517
and went away when lucene snapshot version was upgraded.
Also the name of the index to rollover had to be slightly changed,
so that it doesn't collide with data stream template's namespace.
(a regular index can't be created in the namespace that is managed
by a template that creates data streams)
Closes#62043
This commit changes the default allocation on the "hot" tier to allocating the newly created index
to the "hot" tier if it is part of a new or existing data stream, and to the "content" tier if it is
not part of a data stream.
Overriding any of the index.routing.allocation.(include|exclude|require).* settings continues to
cause the initial allocation not to be set (no change in behavior).
Relates to #60848
This adds ILM support for automatically migrating the managed
indices between data tiers.
This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).
With the differentiation between searchable snapshots on the cold phase and searchable snapshots on
the frozen phase not implemented, there is no need to have a separate phase/tier for now. This
commit removes the frozen phase and tier, which can be added back at a later time.
(this tier was never in a released version, so this is not a breaking change)
Relates to #60983
Relates to #60994
Relates to #60848
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
out if we're using the optimization for segment ordinals. It adds a few
more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
non-trivial for some aggregations, like cardinality.
* Add "synthetics-*-*" templates for synthetics fleet data
For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.
This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
#56709 and #57629.
Resolves#61665
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.
This API is now superseded by the Repositories Metering API.
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats
These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.
In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.