Commit Graph

493 Commits

Author SHA1 Message Date
David Kyle 7daed3b8af
Pipeline Inference Aggregation (#58193)
Adds a pipeline aggregation that loads a model and performs inference on the 
input aggregation results.
2020-07-02 14:33:02 +01:00
Nik Everett 32bdf8549b
Fail variable_width_histogram that collects from many (#58619)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 15:42:46 -04:00
Nik Everett dda78ff760
Docs: Mark variable_width_histogram experimental (#58574)
We're tracking this aggregation's experimental-progress in #58573. We'd
like a little time to be able to make backwards incompatible changes to
the aggregation because we're not 100% sure about the request and
response format yet.
2020-06-25 16:54:37 -04:00
James Dorfman e99d287fbb
Add Variable Width Histogram Aggregation (#42035)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue. 

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
2020-06-23 09:26:54 -04:00
Cris da Rocha b5de14d3f6
Missing comma between value types (#58383)
This applies to all versions of this document (7.7, 7.8, 7.x, current and master).
2020-06-19 23:01:25 +02:00
Tal Levy c765993d82
add geo_shape documentation for supported aggregations (#58284)
This commit adds documentation for geo_shape fields in aggregations

Closes #55495.
2020-06-18 10:17:49 -07:00
James Rodewig 7826bbee87
[DOCS] Move search API's `docvalue_fields` examples (#57760)
Changes:

* Condenses and relocates the `docvalue_fields` example to the 'Run a search' 
   page.
* Adds docs for the `docvalue_fields` request body parameter.
* Updates several related xrefs.

Co-authored-by: debadair <debadair@elastic.co>
2020-06-11 10:57:15 -04:00
andrewjohnson2 a791d6723d
Added standard deviation / variance sampling to extended stats (#49782)
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.

Closes #49554

Co-authored-by: Igor Motov <igor@motovs.org>
2020-06-10 15:00:50 -04:00
James Rodewig 51e3d5ab63
[DOCS] Fix source filtering xrefs (#57720) 2020-06-05 08:46:26 -04:00
Igor Motov 29b5643c1a
Increase search.max_buckets to 65,535 (#57042)
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
2020-06-03 11:54:48 -04:00
Benjamin Trent 484de0cd02
Adding transform docs for geotile_grid (#57000)
transforms and composite aggs support geotile_grid as a source. This adds documentation explaining that support.
2020-06-01 15:32:18 -04:00
Nik Everett 1e5e5e2da2
Update date_histogram docs (#56922)
* Make it more clear that you can use `month` or `1M`.
* Explain rounding rules
* Consistently use "time zone" instead of "timezone". It looks like both
  are right but I see "time zone" much more. And the parameter in
  elasticsearch is `time_zone` so we may as well line up.

Closes #56760

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-05-29 17:13:14 -04:00
Gabriel Petrovay 709ee956d7 Fixed calendar intervals documentation (#56666)
- the 1-letter intervals are not parseable (`m`, `h`, `d`, `w`,  `M`, `q`, `y`)
- fixed formatting broken by new lines
2020-05-15 16:56:27 -04:00
Gil Raphaelli f29c9ff652 [DOCS] Sort metric and pipeline agg docs (#56613) 2020-05-15 16:34:47 -04:00
Tal Levy 79367e43da
Add Normalize Pipeline Aggregation (#56399)
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```

Closes #51005.
2020-05-14 13:32:42 -07:00
Gabriel Petrovay 4029818c24 [Docs] Correct formatting in datehistogram-aggregation.asciidoc (#56664) 2020-05-13 12:02:36 +02:00
Ignacio Vera 4e39184c38
Add moving percentiles pipeline aggregation (#55441)
Similar to what the moving function aggregation does, except merging windows of percentiles 
sketches together instead of cumulatively merging final metrics
2020-05-12 10:30:52 +02:00
James Rodewig af2d13144f
[DOCS] Add reference docs for `search.max_buckets` setting (#56449)
Adds reference-style setting documentation for the `search.max_buckets`
setting.

This setting was previously only documented on the [bucket
aggregations][0] page.

[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket.html
2020-05-11 08:35:24 -04:00
Christos Soulios caf6c5ac19
Histogram field type support for ValueCount and Avg aggregations (#55933)
Implements value_count and avg aggregations over Histogram fields as discussed in #53285

- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
2020-05-04 10:24:35 +03:00
AB Prashanth 785527bb58
[DOCS] Remove approximate document counts example from term agg docs (#55442)
Removes an example from the "Document counts are approximate" section of the
terms agg documentation.

As #52377 details, the example was no longer accurate in 7.x or 6.8. Document
counts were more precise than the example presented.

We've opened issue #56025 to discuss re-adding an example later.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-04-30 09:49:32 -04:00
Christos Soulios cefc6af25b
Histogram field type support for Sum aggregation (#55681)
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
2020-04-29 11:09:25 +03:00
Zachary Tong 9f165bd44e
Aggs must specify a `field` or `script` (or both) (#52226)
* Aggs must specify a `field` or `script` (or both)

This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user.  This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)

* Fix StringStats test

* Add yaml test

* Skip test on older versions

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-23 14:26:38 -04:00
Igor Motov 6d28596ead
Add support for filters to T-Test aggregation (#54980)
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
2020-04-10 10:19:07 -04:00
Igor Motov 5fc9fc528d
Add Student's t-test aggregation support (#54469)
Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.

Relates to #53692
2020-04-03 11:31:13 -04:00
Gil Raphaelli 4090568797
[DOCS] Fix typos in top metrics agg docs (#54299) 2020-03-27 10:48:01 -04:00
Paweł Krześniak de1229cc2b
[DOCS] link fix (#53973)
Fix bad link in top_metrics.
2020-03-23 13:28:43 -04:00
Zachary Tong 84a59f8447
Add scripting, supported-type tests to ValueCount (#53500)
Also adds a few small notes to the documentation regarding potentially
unintuitive behavior
2020-03-16 15:15:25 -04:00
Lisa Cawley 4a5feab88d
[DOCS] Add anchors for scripted metric aggregations (#53618) 2020-03-16 12:14:01 -07:00
Nik Everett 230a9a8975
Improve top_metrics docs (#53521)
* Removes experimental.
* Replaces `"v"` (for value) with `"m"` (for metric).
* Move the note about tiebreaking into the list of limitations of the
  sort.
* Explain how you ask for `metrics`.
* Clean up some wording.
* Link to the docs from `top_metrics`.

Closes #51813
2020-03-16 13:23:22 -04:00
Nik Everett 8410356c5b
Preserve metric types in top_metrics (#53288)
This changes the `top_metrics` aggregation to return metrics in their
original type. Since it only supports numerics, that means that dates,
longs, and doubles will come back as stored, with their appropriate
formatter applied.
2020-03-11 16:44:08 -04:00
Anton Dollmaier e9c8c03fee [DOCS] Fix parameter formatting for GeoHash grid agg docs (#53032)
Adds missing colon (`:`) to the parameter definition list.
2020-03-09 08:17:57 -04:00
Nik Everett 56058ab6af
Support multiple metrics in `top_metrics` agg (#52965)
This adds support for returning multiple metrics to the `top_metrics`
agg. It looks like:
```
POST /test/_search?filter_path=aggregations
{
  "aggs": {
    "tm": {
      "top_metrics": {
        "metrics": [
          {"field": "v"},
          {"field": "m"}
        ],
        "sort": {"s": "desc"}
      }
    }
  }
}
```
2020-03-05 06:53:37 -05:00
Nik Everett f4223b6a8f
Add size support to `top_metrics` (#52662)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 11:14:57 -05:00
István Zoltán Szabó 14555ca01e
[DOCS] Links transforms in aggregation docs (#52563)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-21 08:22:04 +01:00
Nik Everett 5b2266601b
Implement top_metrics agg (#51155)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 07:13:52 -05:00
Igor Motov 0898df4aac
Add histogram field type support to boxplot aggs (#52265)
Add support for the histogram field type to boxplot aggs.

Closes #52233
Relates to #33112
2020-02-13 08:59:44 -05:00
Igor Motov c50cfa0668
Add Boxplot Aggregation (#51948)
Adds a `boxplot` aggregation that calculates min, max, medium and the first
and the third quartiles of the given data set.

Closes #33112
2020-02-07 18:01:20 -05:00
Mark Tozzi 928c663ce0
Fix dangling 'either' in weighted average docs (#51748) 2020-01-31 12:45:46 -05:00
Elvis Saravia 520da54e63
update pipeline.asciidoc
typo
2020-01-24 14:03:01 +01:00
Igor Motov 23be11cf6c
Fix leftover mentions of method parameter in Percentile Aggs (#51272)
The method parameter is not used in the percentile aggs, instead
the method is determined by the presence of `hdr` or `tdigest`
objects.

Relates to #8324
2020-01-22 05:02:48 -10:00
Tal Levy 6c86606d2a
Adds support for geo-bounds filtering in geogrid aggregations (#50002)
It is fairly common to filter the geo point candidates in
geohash_grid and geotile_grid aggregations according to some
viewable bounding box. This change introduces the option of
specifying this filter directly in the tiling aggregation.

This is even more relevant to `geo_shape` where the bounds will restrict
the shape to be within the bounds

this optional `bounds` parameter is parsed in an equivalent fashion to 
the bounds specified in the geo_bounding_box query.
2020-01-14 08:29:10 -08:00
Nik Everett 326d696d9a
Support offset in composite aggs (#50609)
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.

This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.

Closes #48757
2020-01-07 14:49:09 -05:00
James Rodewig 7f35bcdfc9
[DOCS] Warn about using `geo_centroid` as sub-agg to `geohash_grid` (#50038)
If `geo_point fields` are multi-valued, using `geo_centroid` as a
sub-agg to `geohash_grid` could result in centroids outside of bucket
boundaries.

This adds a related warning to the geo_centroid agg docs.
2020-01-06 07:45:49 -06:00
Nik Everett a7cc0b0159
Docs: Refine note about `after_key` (#50475)
* Docs: Refine note about `after_key`

I was curious about composite aggregations, specifically I wanted to
know how to write a composite aggregation that had all of its buckets
filtered out so you *had* to use the `after_key`. Then I saw that we've
declared composite aggregations not to work with pipelines in #44180. So
I'm not sure you *can* do that any more. Which makes the note about
`after_key` inaccurate. This rejiggers that section of the docs a little
so it is more obvious that you send the `after_key` back to us. And so
it is more obvious that you should *only* use the `after_key` that we
give you rather than try to work it out for yourself.

* Apply suggestions from code review

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-01-02 10:02:55 -05:00
James Rodewig 3460dc9542
[DOCS] Percentile aggs are non-deterministic (#50468)
Percentile aggregations are non-deterministic. A percentile aggregation
can produce different results even when using the same data.

Based on [this discuss post][0], the non-deterministic property stems
from processes in Lucene that can affect the order in which docs are
provided to the aggregation.

This adds a warning stating that the aggregation is non-deterministic
and what that means.

[0]: https://discuss.elastic.co/t/different-results-for-same-query/111757
2019-12-23 13:11:31 -05:00
Florian Kelbert 0778c34630 [DOCS] Fix typo in bucket sum aggregation docs (#50431) 2019-12-20 08:47:24 -05:00
Lisa Cawley 6d608e6a0d
[DOCS] Move transform resource definitions into APIs (#50108) 2019-12-17 09:01:31 -08:00
Jim Ferenczi 804a5042e7
Optimize composite aggregation based on index sorting (#48399)
Co-authored-by: Daniel Huang <danielhuang@tencent.com>

This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification.
In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue.
The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases:
  * Multi-valued source, in such case early termination is not possible.
  * missing_bucket is set to true
2019-12-17 14:02:06 +01:00
James Rodewig 2d9ee5ddfe
[DOCS] Correct percentile rank agg example response (#50052)
The example snippets in the percentile rank agg docs use a test dataset
named `latency`, which is generated from docs/gradle.build.

At some point the dataset and example snippets were updated, but the
text surrounding the snippets was not. This means the text and the
example snippets shown no longer match up.

This corrects that by changing the snippets using /TESTRESPONSE magic comments.
2019-12-12 08:38:48 -05:00
Ignacio Vera eade4f03f4
New Histogram field mapper that supports percentiles aggregations. (#48580)
This commit adds  a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.
2019-11-28 13:58:20 +01:00
Przemko Robakowski 04f6b6fdb2
[DOCS] IDs for doc snippets (#49008)
* Ids for docs snippets

* Ids for tests

* Ids for docs snippets

* ignoring build folder from idea

* Ignoring build-eclipse
2019-11-25 15:30:00 +01:00
Lisa Cawley a4efab6ab4
[DOCS] Merge rollup config details into API (#49412) 2019-11-22 08:31:30 -08:00
Christos Soulios b0e12c936b
Implement stats aggregation for string terms (#47468)
This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following:

    min_length: The length of the shortest term
    max_length: The length of the longest term
    avg_length: The average length of all terms
    distribution: The probability distribution of all characters appearing in all terms
    entropy: The total Shannon entropy value calculated for all terms

This aggregation has been implemented as an analytics plugin.
2019-11-14 16:07:54 +02:00
James Rodewig f53eba024b
[DOCS] Remove binary gendered language (#48362) 2019-10-23 09:36:31 -05:00
Ian Danforth 24cf883792 [DOCS] Fix typo in percentile rank aggregation docs (#47247) 2019-10-15 15:56:32 -04:00
Alan Woodward 566e1b7d33
Remove type field from DocWriteRequest and associated Response objects (#47671)
This commit removes the type field from index, update and delete requests, and their
associated responses.

Relates to #41059
2019-10-11 10:23:55 +01:00
Alan Woodward 7a622f024f
Remove types from BulkRequest (#46983)
This commit removes types entirely from BulkRequest, both as a global
parameter and as individual entries on update/index/delete lines.

Relates to #41059
2019-10-07 13:29:12 +01:00
Mark Tozzi c26ce1d7f5
DocValueFormat implementation for date range fields (#47472) 2019-10-04 16:01:28 -04:00
Mark Tozzi 57a679fbbb
Documentation notes for Range field histograms (#46890) 2019-10-01 10:46:04 -04:00
Alan Woodward c1f99e2d75
Remove `_type` from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
Javier Ruiz e8dac62a4a [DOCS] Fix calendar interval typos for date histo agg (#46911) 2019-09-20 15:22:04 -04:00
James Rodewig 370e434986
[DOCS] Correct several [source,console-result] snippets (#46930) 2019-09-20 11:23:15 -04:00
markharwood dc0abec595
Remove Adjacency_matrix setting in favour of Lucene Boolean query clause setting (#46327)
Closes #46324
2019-09-19 16:48:04 +01:00
Philipp Krenn 7c5adcc7c1 Minor improvement to the nested aggregation docs (#46475)
* Minor improvement to the nested aggregation docs

* The attributes name and resellers.name were rather confusing,
  especially since the first one was dynamically mapped and not shown
  in the documentation (you had to read the test to see it). This
  change introduces a unique name for the nested attribute and adds
  the example document to the documentation.
* Change the index name from "index" to something more speaking.

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/aggregations/bucket/nested-aggregation.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-09-11 11:23:39 -04:00
James Rodewig e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
James Rodewig 466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
James Rodewig f5827ba0ae
[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159) 2019-09-04 12:51:02 -04:00
Zachary Tong 758f7999b7
Add CumulativeCard pipeline agg to pipeline index (#46279)
The Cumulative Cardinality docs weren't linked
from the pipeline index page
2019-09-03 12:10:34 -04:00
Zachary Tong 273c35f79c
Add Cumulative Cardinality agg (and Data Science plugin) (#43661)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 10:43:24 -04:00
LHearen 6dadce1112 [DOCS] Correct conditional clause in histogram agg docs (#45643) 2019-08-19 10:09:10 -04:00
LHearen d1c0ea7833 [DOCS] Fix a 'value' -> 'values' typo in histogram aggregation docs (#45642) 2019-08-19 10:02:44 -04:00
Zachary Tong ae7c071ec7
Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179)
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
2019-08-05 12:15:42 -04:00
Nikita Glashenko ead4eb5209 Add more flexibility to MovingFunction window alignment (#44360)
Introduce shift field to MovingFunction aggregation.

By default, shift = 0. Behavior, in this case, is the same as before.
Increasing shift by 1 moves starting window position by 1 to the right.

    To simply include current bucket to the window, use shift = 1
    For center alignment (n/2 values before and after the current bucket), use shift = window / 2
    For right alignment (n values after the current bucket), use shift = window.
2019-08-02 15:09:48 -04:00
Flavio Pompermaier e66889635d [DOCS] Correct sum_other_doc_count value in terms agg example (#45028)
Closes issue #41902
2019-07-31 14:10:05 -04:00
Sandeep Kanabar 0e4be837db [Docs] Update daterange-aggregation.asciidoc (#44730)
Correcting the value to be the same as that specified for "missing".
2019-07-29 12:51:15 +02:00
James Rodewig ea1adb61c2
[DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:16:35 -04:00
Zachary Tong eac86c9bb8
Document that pipeline aggs are not compatible with composite agg (#44180) 2019-07-12 12:34:34 -04:00
Zachary Tong 3e1f73ffa3
Link rare_terms docs from index page (#43882)
Docs for rare_terms were added in #35718, but neglected to
link it from the bucket index page
2019-07-02 13:10:46 -04:00
Zachary Tong baf155dced
Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:02:36 -04:00
Paul Sanwald 6357857bba
Adds a minimum interval to `auto_date_histogram`. (#42814)
Adds a minimum interval to `auto_date_histogram`. We do this by
restricting the roundings passed into to the aggregator.
2019-06-11 15:53:19 -04:00
Zachary Tong 0192fe7d7c
Add documentation for calendar/fixed intervals (#41919)
Original PR missed documentation for the new calendar/fixed
intervals.  This adds the missing documentation
2019-05-10 15:27:41 -04:00
Zachary Tong 290c8b8256
Force selection of calendar or fixed intervals in date histo agg (#33727)
The date_histogram accepts an interval which can be either a calendar 
interval (DST-aware, leap seconds, arbitrary length of months, etc) or 
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but 
`2d` == fixed.  And if you want a day of fixed time, you have to 
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.  

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the 
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-06 17:17:11 -04:00
James Rodewig adf67053f4
[DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:19:09 -04:00
Ignacio Vera cc48427e05
Improve accuracy for Geo Centroid Aggregation (#41033)
keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.
2019-04-25 08:06:55 +02:00
Zachary Tong 5ccc0b5a32
Disallow null/empty or duplicate composite sources (#41359)
Adds some validation to prevent duplicate source names from being 
used in the composite agg.

Also refactored to use a ConstructingObjectParser and removed the 
private ctor and setter for sources, making it mandatory.
2019-04-24 13:22:06 -04:00
Jason Tedor 656cc709b2
Fix intervals section of auto date-histogram docs (#41203)
This section should be at the same sub-level as other sections in the
auto date-histogram docs, otherwise it is rendered on to another page
and is confusing for users to understand what it's in reference to.
2019-04-15 11:27:38 -04:00
Antonio Matarrese badb8559fb Use the breadth first collection mode for significant terms aggs.
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.

Closes #28652.
2019-04-11 15:38:25 -07:00
Lisa Cawley 13454376a4
[DOCS] Fixes callout for Asciidoctor migration (#41127) 2019-04-11 12:04:04 -07:00
Zachary Tong 6f0f8ab4bc
Remove MovingAverage pipeline aggregation (#39328)
This was deprecated in 6.4.0 and for the entirety of 7.0.  Removed
in 8.0
2019-03-19 15:31:05 -04:00
Ian 98bbb4176e Correct date in daterange-aggregation.asciidoc (#39727) 2019-03-06 11:30:17 +01:00
Samuel Cifuentes García ff6ffe8ba1 Improved Terms Aggregation documentation (#38892)
Added a note after the first query example talking about fielddata.
2019-03-05 10:17:01 -05:00
Hannes Van De Vreken b76a380f18 Fix typo in DateRange docs (yyy → yyyy) (#38883) 2019-02-15 10:20:16 -05:00
Alexander Reelsen 5f7168ea74
Remove joda time mentions in documentation (#38720)
This is the forward port of #38720 (not containing the 7.0 migration docs)
2019-02-14 10:18:48 +01:00
Yuri Astrakhan f133bf4ed8
add geotile_grid ref to asciidoc (#38632) 2019-02-08 11:37:35 -05:00
Yuri Astrakhan f3cde06a1d
geotile_grid implementation (#37842)
Implements `geotile_grid` aggregation

This patch refactors previous implementation https://github.com/elastic/elasticsearch/pull/30240

This code uses the same base classes as `geohash_grid` agg, but uses a different hashing
algorithm to allow zoom consistency.  Each grid bucket is aligned to Web Mercator tiles.
2019-01-31 19:11:30 -05:00
Julie Tibshirani 9ca26b7e63
Remove more references to type in docs. (#37946)
* Update the top-level 'getting started' guide.
* Remove custom types from the painless getting started documentation.
* Fix an incorrect references to '_doc' in the cardinality query docs.
* Update the _update docs to use the typeless API format.
2019-01-29 10:51:07 -08:00
Jim Ferenczi cb451edb01
Allow nested fields in the composite aggregation (#37178)
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes #28611
2019-01-25 14:00:39 +01:00
Christoph Büscher 967de04257
Uppercasing some docs section title (#37781)
Section titles are mostly uppercase, only a few cases where query DSL parameters
or Java method names are used as the title they should be lowercased.
2019-01-24 22:54:55 +01:00
Christoph Büscher 95a6951f78
Use new bulk API endpoint in the docs (#37698)
This change switches to using the typeless bulk API endpoint in the
documentation snippets where possible
2019-01-23 09:46:28 +01:00
Boaz Leskes 52ba407931
Expose sequence number and primary terms in search responses (#37639)
Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it. 

This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.
2019-01-23 09:01:58 +01:00
Christoph Büscher 34f2d2ec91
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 15:13:52 +01:00
Christoph Büscher 3a96608b3f
Remove more include_type_name and types from docs (#37601) 2019-01-18 14:11:18 +01:00
Christoph Büscher 25aac4f77f
Remove `include_type_name` in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani 36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Josh Soref edb48321ba [DOCS] Various spelling corrections (#37046) 2019-01-07 14:44:12 +01:00
Igor Motov d6acd8e15f
Docs: add clarification about geohash use in geohashgrid agg (#36901)
Adds an example on translating geohashes returned by geohashgrid 
agg as bucket keys into geo bounding box filters in elasticsearch as well
as 3rd party applications.

Closes #36413
2019-01-03 15:40:48 -05:00
Luca Cavanna 42ea644903
Remove single shard optimization when suggesting shard_size (#37041)
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.

Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.

While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.

This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.

This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.

Relates to #32125
2019-01-02 17:45:49 +01:00
João Barbosa 276726aea2 Added keyed response to pipeline percentile aggregations 22302 (#36392)
Closes #22302
2018-12-14 16:22:54 -05:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Jeff Hajewski 49087f16f5 Adds deprecation logging to ScriptDocValues#getValues. (#34279)
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.

Closes #22919
2018-11-27 14:30:13 -05:00
William Desportes a204d1cdff [Docs] Fix typo in datehistogram-aggregation.asciidoc (#35855) 2018-11-23 15:16:53 +01:00
Jim Ferenczi d96202a282 [DOCS] Fix missing callouts 2018-11-08 15:40:01 +01:00
Dominik Stadler d351422215 Add parent-aggregation to parent-join module (#34210)
Add `parent` aggregation, a special single bucket aggregation that joins children documents to their parent.
2018-11-08 14:13:00 +01:00
Russ Cam 848847d8c7 [Docs] Section header preceded by blank line (#34340) 2018-11-08 12:44:13 +01:00
Sue Gallagher 1ce3c92a2d [DOCS] Add info on calendar vs fixed interval. (#31638)
Extensive edit to add additional information on the difference between calendar intervals and fixed-length intervals.
2018-10-31 10:16:36 -04:00
Andy Bristol b8280ea7cc
median absolute deviation agg (#34482)
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation

Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
2018-10-30 07:22:52 -07:00
Gordon Brown 794d4fa879
Label required scripts in Scripted Metric Agg docs (#35051)
When combine_script and reduce_script were made into required
parameters for Scripted Metric aggregations in #33452, the docs were
not updated to reflect that. This marks those parameters as required
in the documentation.
2018-10-29 15:13:14 -06:00
Julie Tibshirani f854330e06
Make sure to use the type _doc in the REST documentation. (#34662)
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
2018-10-22 11:54:04 -07:00
Zachary Tong d981746142
[Docs] clarification about cardinality accuracy (#34616)
Adds a bit more clarification about how accuracy is dependent
on the dataset in question.

Closes #18231
2018-10-22 13:15:45 -04:00
markharwood fe623acf66
Docs - removed experimental/beta markers from adjacency matrix aggregation (#34599) 2018-10-19 09:33:59 +01:00
markharwood 2a413abb0b
Docs - remove experimental marker from significant_text aggregation (#34598) 2018-10-19 09:32:02 +01:00
Jim Ferenczi 36557469f6
[DOCS] Removes beta label from composite aggregation (#34329) 2018-10-05 19:46:20 +02:00
Nik Everett dc2cf28fde
Docs: Allow skipping response assertions (#34240)
We generate tests from our documentation, including assertions about the
responses returned by a particular API. But sometimes we *can't* assert
that the response is correct because of some defficiency in our tooling.
Previously we marked the response `// NOTCONSOLE` to skip it, but this
is kind of odd because `// NOTCONSOLE` is really to mark snippets that
are json but aren't requests or responses. This introduces a new
construct to skip response assertions:
```
// TESTRESPONSE[skip:reason we skipped this]
```
2018-10-04 08:03:38 -04:00
Serge Populov 13af5d5d7f Docs: Fix typo in field name in aggregations (#34223) 2018-10-02 10:54:29 -04:00
ben5556 012b9c7539 Corrected aggregation name to match the example (#33786) 2018-09-17 18:24:43 -07:00
Ryan Ernst 3046656ab1
Scripting: Rework joda time backcompat (#33486)
This commit switches the joda time backcompat in scripting to use
augmentation over ZonedDateTime. The augmentation methods provide
compatibility with the missing methods between joda's DateTime and
java's ZonedDateTime. Due to getDayOfWeek returning an enum in the java
API, ZonedDateTime is wrapped so that the method can return int like the
joda time does. The java time api version is renamed to
getDayOfWeekEnum, which will be kept through 7.x for compatibility while
users switch back to getDayOfWeek once joda compatibility is removed.
2018-09-16 19:18:00 -07:00
Christoph Büscher fe478c23b7
[Docs] Fix heading in composite-aggregation.asciidoc (#33627)
The heading for the "Missing buckets" should be on the same level 
as the the "Order" section.
2018-09-12 16:56:03 +02:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Paul Sanwald c303006e6b
Add interval response parameter to AutoDateInterval histogram (#33254)
Adds the interval used to the aggregation response.
2018-09-05 07:35:59 -04:00
lipsill b7c0d2830a [Docs] Remove repeating words (#33087) 2018-08-28 13:16:43 +02:00
Luca Cavanna 393eec1482
Set maxScore for empty TopDocs to Nan rather than 0 (#32938)
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
2018-08-22 17:23:54 +02:00
Ryan Ernst 478f6d6cf1
Scripting: Conditionally use java time api in scripting (#31441)
This commit adds a boolean system property, `es.scripting.use_java_time`,
which controls the concrete return type used by doc values within
scripts. The return type of accessing doc values for a date field is
changed to Object, essentially duck typing the type to allow
co-existence during the transition from joda time to java time.
2018-08-01 08:58:49 -07:00
Colm O'Shea 97b379e0d4 fix no=>not typo (#32463)
Found a tiny typo while reading the docs
2018-07-31 13:33:23 +01:00
Sandeep Kanabar 7ad16ffd84 Docs: Correcting a typo in tophits (#32359) 2018-07-26 13:30:01 -04:00
Zachary Tong 6ba144ae31
Add WeightedAvg metric aggregation (#31037)
Adds a new single-value metrics aggregation that computes the weighted 
average of numeric values that are extracted from the aggregated 
documents. These values can be extracted from specific numeric
fields in the documents.

When calculating a regular average, each datapoint has an equal "weight"; it
contributes equally to the final value.  In contrast, weighted averages
scale each datapoint differently.  The amount that each datapoint contributes 
to the final value is extracted from the document, or provided by a script.

As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`

A regular average can be thought of as a weighted average where every value has
an implicit weight of `1`.

Closes #15731
2018-07-23 18:33:15 -04:00
Paul Sanwald feb07559aa fix typo 2018-07-13 14:59:11 -04:00
Colin Goodheart-Smithe 0edb096eb4 Adds a new auto-interval date histogram (#28993)
* Adds a new auto-interval date histogram

This change adds a new type of histogram aggregation called `auto_date_histogram` where you can specify the target number of buckets you require and it will find an appropriate interval for the returned buckets. The aggregation works by first collecting documents in buckets at second interval, when it has created more than the target number of buckets it merges these buckets into minute interval bucket and continues collecting until it reaches the target number of buckets again. It will keep merging buckets when it exceeds the target until either collection is finished or the highest interval (currently years) is reached. A similar process happens at reduce time.

This aggregation intentionally does not support min_doc_count, offest and extended_bounds to keep the already complex logic from becoming more complex. The aggregation accepts sub-aggregations but will always operate in `breadth_first` mode deferring the computation of sub-aggregations until the final buckets from the shard are known. min_doc_count is effectively hard-coded to zero meaning that we will insert empty buckets where necessary.

Closes #9572

* Adds documentation

* Added sub aggregator test

* Fixes failing docs test

* Brings branch up to date with master changes

* trying to get tests to pass again

* Fixes multiBucketConsumer accounting

* Collects more buckets than needed on shards

This gives us more options at reduce time in terms of how we do the
final merge of the buckeets to produce the final result

* Revert "Collects more buckets than needed on shards"

This reverts commit 993c782d11.

* Adds ability to merge within a rounding

* Fixes nonn-timezone doc test failure

* Fix time zone tests

* iterates on tests

* Adds test case and documentation changes

Added some notes in the documentation about the intervals that can bbe
returned.

Also added a test case that utilises the merging of conseecutive buckets

* Fixes performance bug

The bug meant that getAppropriate rounding look a huge amount of time
if the range of the data was large but also sparsely populated. In
these situations the rounding would be very low so iterating through
the rounding values from the min key to the max keey look a long time
(~120 seconds in one test).

The solution is to add a rough estimate first which chooses the
rounding based just on the long values of the min and max keeys alone
but selects the rounding one lower than the one it thinks is
appropriate so the accurate method can choose the final rounding taking
into account the fact that intervals are not always fixed length.

Thee commit also adds more tests

* Changes to only do complex reduction on final reduce

* merge latest with master

* correct tests and add a new test case for 10k buckets

* refactor to perform bucket number check in innerBuild

* correctly derive bucket setting, update tests to increase bucket threshold

* fix checkstyle

* address code review comments

* add documentation for default buckets

* fix typo
2018-07-13 13:08:35 -04:00
Jimi Ford e955ffc38d Docs: fix typo in datehistogram (#31972) 2018-07-11 15:04:57 -04:00
Peter Evers ea15284230 Docs: Match the examples in the description (#31710)
Prose drifted from snippet.
2018-07-02 14:12:49 -04:00
Peter Evers 050fbc8f3d Docs: Fix description of percentile ranks example example (#31652) 2018-06-28 09:29:56 -04:00
Sue Gallagher 357a07e7a2
[DOCS] Fix heading format errors (#31483)
* [DOCS] Fix heading format errors. Closes #31327

* [DOCS] Fix heading format errors. Closes #31327
2018-06-25 17:25:32 -07:00
Jonathan Little 8e4768890a Migrate scripted metric aggregation scripts to ScriptContext design (#30111)
* Migrate scripted metric aggregation scripts to ScriptContext design #29328

* Rename new script context container class and add clarifying comments to remaining references to params._agg(s)

* Misc cleanup: make mock metric agg script inner classes static

* Move _score to an accessor rather than an arg for scripted metric agg scripts

This causes the score to be evaluated only when it's used.

* Documentation changes for params._agg -> agg

* Migration doc addition for scripted metric aggs _agg object change

* Rename "agg" Scripted Metric Aggregation script context variable to "state"

* Rename a private base class from ...Agg to ...State that I missed in my last commit

* Clean up imports after merge
2018-06-25 12:01:33 +01:00
Colin Goodheart-Smithe 58e9446e00
Removes experimental tag from scripted_metric aggregation (#31298) 2018-06-13 17:24:32 +01:00
Christoph Büscher 4777d8a2df
[Docs] Fix typo in Min Aggregation reference (#30899) 2018-05-31 15:05:03 +02:00
Jim Ferenczi e33d107f84
Add missing_bucket option in the composite agg (#29465)
This change adds a new option to the composite aggregation named `missing_bucket`.
This option can be set by source and dictates whether documents without a value for the
source should be ignored. When set to true, documents without a value for a field emits
an explicit `null` value which is then added in the composite bucket.
The `missing` option that allows to set an explicit value (instead of `null`) is deprecated in this change and will be removed in a follow up (only in 7.x).
This commit also changes how the big arrays are allocated, instead of reserving
the provided `size` for all sources they are created with a small intial size and they grow
depending on the number of buckets created by the aggregation:
Closes #29380
2018-05-30 09:48:40 +02:00
Julie Tibshirani 638a719370
Ensure that ip_range aggregations always return bucket keys. (#30701) 2018-05-24 08:55:14 -07:00
Piotr Prądzyński a0a8c4f186 filters agg docs duplicated 'bucket' word removal (#30677)
In one place word 'bucket' was duplicated.
2018-05-17 15:21:50 +01:00
Piotr Prądzyński cefbd29db3 top_hits doc example description update (#30676)
Example description does not fit example code.
2018-05-17 15:21:25 +01:00
Zachary Tong df853c49c0
Add a MovingFunction pipeline aggregation, deprecate MovingAvg agg (#29594)
This pipeline aggregation gives the user the ability to script functions that "move" across a window
of data, instead of single data points.  It is the scripted version of MovingAvg pipeline agg.

Through custom script contexts, we expose a number of convenience methods:

 - MovingFunctions.max()
 - MovingFunctions.min()
 - MovingFunctions.sum()
 - MovingFunctions.unweightedAvg()
 - MovingFunctions.linearWeightedAvg()
 - MovingFunctions.ewma()
 - MovingFunctions.holt()
 - MovingFunctions.holtWinters()
 - MovingFunctions.stdDev()

The user can also define any arbitrary logic via their own scripting, or combine with the above methods.
2018-05-16 10:57:00 -04:00
Jason Tedor 4a4e3d70d5
Default to one shard (#30539)
This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the split API in place they no longer need to resort only to
reindexing.

Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).
2018-05-14 12:22:35 -04:00