elasticsearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	6fc05a6bb1	Remove redundant fields on TextFieldMapper and KeywordFieldMapper (#99666 ) These fields were also on the mapped field types, no need to waste bytes on them.	2023-09-19 10:43:42 -04:00
Alan Woodward	4e1fb3fca5	Automatically disable `ignore_malformed` on datastream `@timestamp` fields (#99346 ) Data-stream mappings require a @timestamp field to be present and configured as a date with a specific set of parameters. The index-wide setting of ignore_malformed can cause problems here if it is set to true, because it needs to be false for the @timestamp field. This commit detects if a set of mappings is configured for a datastream by checking for the presence of a DataStreamTimestampFieldMapper metadata field, and passes that information on during Mapper construction as part of the MapperBuilderContext. DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp field, and if it is, sets ignore_malformed to false. Relates to #96051	2023-09-13 15:02:22 +01:00
Armin Braun	574fb05946	Deduplicate org.apache.lucene.document.FieldType instances across mappers (#99361 ) We mostly have a handful of `FieldType` values here across all mappers and none of them contain attributes. There's only so many combinations here, lets deduplicate these to save some heap and set up subsequent mapper heap savings.	2023-09-08 22:18:35 +02:00
Armin Braun	f1a376c317	Remove CopyTo.Builder (#99368 ) The copyTo builder is really hard to reason about when it comes to mapper merging, because the `reset` method would actually mutate an existing mapper. That seems dangerous and the whole thing is quite inefficient as well. -> this PR just removes it and uses a copy constructor for copy on write, avoiding instance creation on mapper merges here and there and leaving no doubt about these things being immutable.	2023-09-08 13:24:31 -04:00
Jim Ferenczi	28a504d7a1	Use the Weight#matches mode for highlighting by default (#96068 ) This PR adapts the unified highlighter to use the Weight#matches mode by default when possible. This is the default mode in Lucene for some time now. For cases where the matches mode won't work (nested and parent-child queries), the matches mode is disabled automatically. I didn't expose an option to explicitly disable this mode because that should be seen as an internal implementation detail. With this change, matches that span multiple terms are highlighted together (something that users asked for years) and the clauses that don't match the document are ignored.	2023-08-09 10:44:38 +09:00
Simon Cooper	a873e26cf7	Convert IndexVersion.CURRENT to a method with a pluggable interface (#97132 )	2023-06-27 14:47:32 +01:00
Simon Cooper	71c12262fb	Migrate index created version to IndexVersion (#96066 )	2023-06-14 09:43:31 +01:00
Alan Woodward	093e36c875	Introduce DocumentParsingException (#92646 ) Document parsing methods currently throw MapperParsingException. This isn't very helpful, as it doesn't contain any information about where the parse error happened - it is designed for parsing mappings, which are realised into java maps before being examined. This commit introduces a new exception specifically for document parsing that extends XContentException, so that it reports the current position of the parser as part of its error message. Fixes #85083	2023-03-31 12:14:19 +01:00
Simon Cooper	56d53da381	Migrate LuceneDocument.getFields(String) to a List (#94830 )	2023-03-29 11:08:36 +01:00
Alan Woodward	35da97214c	Make IndexAnalyzers an interface (#94819 ) IndexAnalyzers is currently always a concrete class wrapping several Maps of NamedAnalyzers. This means that whenever it is used it needs to instantiate all of its component analyzers, making testing much heavier than it needs to be. It also means that things like overriding analysis for legacy indexes is pushed into mapper parameters, rather than being handled in a single place. This commit makes IndexAnalyzers into an interface, with an anonymous concrete implementation that handles reloading and closing for index shards.	2023-03-28 16:07:08 +01:00
Alan Woodward	639eab0549	Remove force_source option for highlighting (#93193 ) This was only needed because the percolator uses a MemoryIndex which did not support stored fields, and so when it ran a highlighting phase it needed to force it to read from source. MemoryIndex added stored fields support in lucene 9.5, so we can remove this internal parameter. The parameter remains available, but deprecated, via the rest layer, and no longer has any effect.	2023-02-21 09:51:28 +00:00
Denilson das Mercês Amorim	09ccaa02b8	Annotated highlight does not include annotation when search contains both annotation and annotated term (#92920 ) The annotation highlighter can miss annotations if they overlap with another search term. This commit re-sorts incoming passages to ensure that all terms are seen by the highlighter. Fixes #91944	2023-01-23 13:15:46 +00:00
Mark Vieira	c2eda511de	Add JUnit rule based integration test cluster orchestration framework (#92379 ) This commit adds a new test framework for configuring and orchestrating test clusters for both Java and YAML REST testing. This will eventually replace the existing "test-clusters" Gradle plugin and the build-time cluster orchestration.	2022-12-21 15:33:46 -08:00
Nik Everett	c71aa06463	Support `fields` in synthetic source in last cases (#91595 ) This adds support for the `fields` API from painless in to synthetic _source for `text` fields that have a keyword sub field. This is kind of esoteric sounding, but it's the default mapping for strings you send in json so synthetic `_source` supports it. So the `fields` API should too.	2022-11-16 13:47:48 -05:00
Nik Everett	74d0d19c0f	Synthetic _source: support `field` in many cases (#89950 ) This adds support for the `field` scripting API in many but not all cases. Before this change numbers, dates, and IPs supported the `field` API when running with _source in synthetic mode because they always have doc values. This change adds support for `match_only_text`, `store`d `keyword` fields, and `store`d `text` fields. Two remaining field configurations work with synthetic _source and do not work with `field`: * A `text` field with a sub-`keyword` field that has `doc_values` * A `text` field with a sub-`keyword` field that is `store`d ![image](https://user-images.githubusercontent.com/215970/189217841-4378ed42-e454-42c1-aaf0-6c2c041b29be.png)	2022-11-10 10:44:06 -05:00
Alan Woodward	41ab45a5d9	Report synthetic source status in MapperBuilderContext (#91400 ) We currently work out whether or not a mapper should be storing additional values for synthetic source by looking at the DocumentParserContext. However, this value does not change for the lifetime of the mapper - it is defined by metadata on the root mapper and is immutable - and DocumentParserContext feels like the wrong place for this information as it holds context specific to the document being parsed. This commit moves synthetic source status information from DocumentParserContext to MapperBuilderContext instead. Mappers which need this information retrieve it at build time and hold it on final fields.	2022-11-08 14:55:16 +00:00
Nik Everett	bc49392bfb	Support malformed numbers in synthetic _source (#90428 ) This adds support for `ignore_malformed` to numeric fields other than `scaled_float` in synthetic `_source`. Their values are saved to a stored field and loaded to render the `_source`.	2022-10-04 12:17:30 -04:00
Nik Everett	f4fad2548f	Always support ignore_malformed in the same way (#90565 ) This makes sure that all field types that support `ignore_malfored` do so in the same way. Production changes: * All mapper has an `ignoreMalformed` method that must return `true` if the field accepts the `ignore_malformed` mapping parameter was configured. It defaults to `false` because many fields either don't have a concept of "malformed" value or don't have the ability to ignore malformed values. * Fix the `scaled_float` field to store it's field name in `_ignored` if it ignores any malfored values. This is how all other field mappers work. Test changes: * `MapperTestCase` forces subclasses to declare if their `supportIgnoreMalformed` or not. * If `MapperTestCase` subclasses `supportIgnoreMalfored` they must define some `exampleMalformedValues`. * `MapperTestCase` always grows three new tests: * One that creates a field without setting `ignore_malformed` and verifies that all `exampleMalformedValues` throw expected errors * On that explicitly configured `ignore_malformed` to false and, if `supportIgnoreMalformed` it verifies the errors again. If not `supportIgnoreMalformed` it verifies that the parameter is unknown. * On that explicitly configured `ignore_malformed` to true and, if `supportIgnoreMalformed` it verifies that parsing doesn't produce errors and correctly produces `_ignored`. If not `supportIgnoreMalformed` it verifies that the parameter is unknown. * Moved some subclasesses of `MapperTestCase` from `internalClusterTests` to `tests`. This isn't strictly required but that's the right place for them.	2022-10-03 06:18:02 -04:00
Rene Groeschke	cdf5bd7ed0	Rework testing conventions gradle plugin (#87213 ) This PR reworks the testing conventions precommit plugin. This plugin now: - is compatible with yaml, java rest tests and internalClusterTest (aka different sourceSets per test type) - enforces test base class and simple naming conventions (as it did before) - adds one check task per test sourceSet - uses the worker api to improve task execution parallelism and encapsulation - is gradle configuration cache compatible This also ports the TestingConventions integration testing to Spock and removes the build-tools-internal/test kit folder that is not required anymore. We also add some common logic for testing java related gradle plugins. We will apply further cleanup on other tests within our test suite in a dedicated follow up cleanup	2022-06-20 16:26:38 +02:00
Armin Braun	7a25453dec	Speed up FieldMapper construction/parsing/serialization (#86860 ) Speeding this up some more as it's now 50% of the bootstrap time of the many shards benchmarks. Iterating an array here in all cases is quite a bit faster than iterating various kinds of lists and doesn't complicate the code. Also removes a redundant call to `getValue()` for each parameter during serialization.	2022-05-23 12:09:00 +02:00
Yannick Welsch	5aebb8ee38	Add text field support to archive indices (#86591 ) Adds support for "text" fields in archive indices, with the goal of adding simple filtering support on text fields when querying archive indices. There are some differences to regular text fields: - no global statistics: queries on text fields return constant score (similar to match_only_text). - analyzer fields can be updated - if defined analyzer is not available, falls back to default analyzer - no guarantees that analyzers are BWC The above limitations also give us the flexibility to eventually swap out the implementation with a "runtime-text field" variant, and hence only provide those capabilities that can be emulated via a runtime field. Relates #81210	2022-05-18 10:25:38 +02:00
Armin Braun	82933a8599	Save redundant singleton maps in field mappers (#86785 ) In the many-shards benchmarks the singleton maps storing just a single analyzer for each keyword field mapper cost around 5% of the total heap usage on data nodes (700MB for ~15k indices which translate into ~16M instances of keyword field mapper for Beats mappings). Creating specific implementations for the zero, one or many analyzers use cases that already have their own specialized constructors eliminates this overhead completely. relates #77466	2022-05-16 15:13:51 +02:00
Nik Everett	a589456b81	Synthetic source (#85649 ) This attempts to shrink the index by implementing a "synthetic _source" field. You configure it by in the mapping: ``` { "mappings": { "_source": { "synthetic": true } } } ``` And we just stop storing the `_source` field - kind of. When you go to access the `_source` we regenerate it on the fly by loading doc values. Doc values don't preserve the original structure of the source you sent so we have to make some educated guesses. And we have a rule: the source we generate would result in the same index if you sent it back to us. That way you can use it for things like `_reindex`. Fetching the `_source` from doc values does slow down loading somewhat. See numbers further down. ## Supported fields This only works for the following fields: * `boolean` * `byte` * `date` * `double` * `float` * `geo_point` (with precision loss) * `half_float` * `integer` * `ip` * `keyword` * `long` * `scaled_float` * `short` * `text` (when there is a `keyword` sub-field that is compatible with this feature) ## Educated guesses The synthetic source generator makes `_source` fields that are: * sorted alphabetically * as "objecty" as possible * pushes all arrays to the "leaf" fields * sorts most array values * removes duplicate text and keyword values These are mostly artifacts of how doc values are stored. ### sorted alphabetically ``` { "b": 1, "c": 2, "a": 3 } ``` becomes ``` { "a": 3, "b": 1, "c": 2 } ``` ### as "objecty" as possible ``` { "a.b": "foo" } ``` becomes ``` { "a": { "b": "foo" } } ``` ### pushes all arrays to the "leaf" fields ``` { "a": [ { "b": "foo", "c": "bar" }, { "c": "bort" }, { "b": "snort" } } ``` becomes ``` { "a" { "b": ["foo", "snort"], "c": ["bar", "bort"] } } ``` ### sorts most array values ``` { "a": [2, 3, 1] } ``` becomes ``` { "a": [1, 2, 3] } ``` ### removes duplicate text and keyword values ``` { "a": ["bar", "baz", "baz", "baz", "foo", "foo"] } ``` becomes ``` { "a": ["bar", "baz", "foo"] } ``` ## `_recovery_source` Elasticsearch's shard "recovery" process needs `_source` sometimes. So does cross cluster replication. If you disable source or filter it somehow we store a `_recovery_source` field for as long as the recovery process might need it. When everything is running smoothly that's generally a few seconds or minutes. Then the fields is removed on merge. This synthetic source feature continues to produce `_recovery_source` and relies on it for recovery. It's possible to synthesize `_source` during recovery but we don't do it. That means that synethic source doesn't speed up writing the index. But in the future we might be able to turn this on to trade writing less data at index time for slower recovery and cross cluster replication. That's an area of future improvement. ## perf numbers I loaded the entire tsdb data set with this change and the size: ``` standard -> synthetic store size 31.0 GB -> 7.0 GB (77.5% reduction) _source 24695.7 MB -> 47.6 MB (99.8% reduction - synthetic is in _recovery_source) ``` A second _forcemerge a few minutes after rally finishes should removes the remaining 47.6MB of _recovery_source. With this fetching source for 1,000 documents seems to take about 500ms. I spot checked a lot of different areas and haven't seen any different hit. I expect this performance impact is based on the number of doc values fields in the index and how sparse they are.	2022-05-10 07:46:58 -04:00
Armin Braun	cb41ed09e3	Deduplicate default FieldType in KeywordFieldMapper (#86346 ) The default type is incredibly common and instances are not trivial in size with 16 fields. Heap dumps from larger data nodes holding many keyword fields with the default field type can contain hundreds of MB of heap used for these. Same reasoning applies to the `TextSearchInfo` deduplication. `TextSearchInfo` was turned into a record to give us an `equals` implementation.	2022-05-03 16:11:36 +02:00
Ryan Ernst	f0d0c373cd	Remove uses of Charset name parsing (#85795 ) There are many places in Elasticsearch which must decode some stream of bytes into characters. Most of the time this is expected to be UTF-8 encoded data, and we hardcode that charset name. However, methods in the JDK that take a String charset name require catching UnsupportedEncodingException. Yet most of these APIs also has a variant of the same methods which take a known Charset instance, for which we can use StandardCharsets.UTF_8. This commit converts most instances of passing string charset names to use a Charset instance.	2022-04-12 12:05:32 -07:00
Armin Braun	9ec646302d	Remove Restricted String Mapping Param (#85129 ) This param was incredibly expensive to set up when parsing mappings and is one of the big contributors to mapping parsing slowness on master. Since all uses of this parameter type are statically known it seems the most straight forward to simply statically hard code the validators so that we save some allocations.	2022-03-21 12:35:43 +01:00
Mayya Sharipova	26c3dd6857	Upgrade to lucene-9.1.0-snapshot-1336263051c (#83667 ) Lucene issues that resulted in elasticsearch changes: LUCENE-9820 Separate logic for reading the BKD index from logic to intersecting it. LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() LUCENE-10301: make the test-framework a proper module by moving all test classes to org.apache.lucene.tests LUCENE-10300: rewrite how resources are read in ukrainian morfologik analyzer: LUCENE-10054 Make HnswGraph hierarchical	2022-02-22 09:53:20 +01:00
Przemyslaw Gomulka	037261356e	Convert 'id' and '_id' values in REST API tests to strings (#82681 ) Follow-up from #77144 (comment) with converting id/_id to always be strings instead of integers. This makes the type value in the Elasticsearch specification be only string instead of string \| number. this change was generated using following command on ubuntu find . -type f -name ".yml" -print0 \| xargs -0 sed -i -r 's/([^a-zA-Z0-9_\.]id\|[^a-zA-Z0-9_]_id):(\s)([0-9]+)/\1:\2"\3"/g'	2022-02-10 09:14:17 +01:00
Artem Prigoda	fc5a820da9	Migrate to Java 16 Records (part 1) (#82338 ) Try to represent immutable data with Java records introduced in [JEP 395](https://openjdk.java.net/jeps/395)	2022-01-18 17:53:06 +01:00
weizijun	b6e8b59880	TSDB: fix reindex failed tests without feature flag (#81967 ) fix as the #80945 do. register a settings update consumer for the end_time for the tsdb index even when the end_time setting wasn't registered. Pass the feature flag to reindex yaml tests. Co-authored-by: Igor Motov <igor@motovs.org>	2022-01-06 14:45:08 -05:00
Rory Hunter	add386dd00	Fix shadowed vars pt5 (#80855 ) Part of #19752. Fix more instances where local variable names were shadowing field names.	2021-11-19 10:47:26 +00:00
Mark Vieira	12ad399c48	Reformat Elasticsearch source	2021-10-27 08:19:51 -07:00
Chris Hegarty	20c9f756d2	Fix split package org.elasticsearch.common.xcontent (#78831 ) Fix the split package org.elasticsearch.common.xcontent, between server and the x-content lib. Move the x-content lib exported package from org.elasticsearch.common.xcontent to org.elasticsearch.xcontent ( following the naming convention of similar libraries ). Removing split packages is a prerequisite to modularization.	2021-10-08 17:14:26 +01:00
Ryan Ernst	0a1a7b3559	Fix split package in annotated text plugin (#78133 ) The annotated text mapper plugin reuses package names from server. This commit moves the implementation classes into an annotated text package specifically for the plugin.	2021-09-21 13:00:36 -07:00
Chris Hegarty	c1950a6d27	Relocate org.apache.lucene.search.uhighlight -> org.elasticsearch.search.uhighlight (#78099 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-09-21 12:15:19 -04:00
Alan Woodward	9312eba5ed	Change Mapper.build() to take a context object (#77108 ) Mapper.build() currently takes a ContentPath object that it can use to generate field type names that will include its parent names. We would like to expand field types to include more information about their parents, and ContentPath does not hold this information. This commit replaces the ContentPath parameter with a new MapperBuilderContext, which currently holds only the content path information but can be expanded in future to hold parent relationship information. Relates to #75474	2021-09-08 16:34:14 +01:00
Armin Braun	096b8ccc26	Fix TextFieldMapper Retaining a Reference to its Builder (#77251 ) Fixes the text field mapper and the analyzers class that also retained parameter references that go really heavy. Makes `TextFieldMapper` take hundreds of bytes compared to multiple kb per instance. closes #73845	2021-09-03 18:44:11 +02:00
Rene Groeschke	35ec6f348c	Introduce simple public yaml-rest-test plugin (#76554 ) This introduces a basic public yaml rest test plugin that is supposed to be used by external elasticsearch plugin authors. This is driven by #76215 - Rename yaml-rest-test to intern-yaml-rest-test - Use public yaml plugin in example plugins Co-authored-by: Mark Vieira <portugee@gmail.com>	2021-08-31 08:45:52 +02:00
Luca Cavanna	c6641bf00c	Rename ParseContext to DocumentParserContext (#74963 ) ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings. To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.	2021-07-06 09:15:59 -04:00
Ryan Ernst	ab1a2e4a84	Add precommit task for detecting split packages (#73784 ) Modularization of the JDK has been ongoing for several years. Recently in Java 16 the JDK began enforcing module boundaries by default. While Elasticsearch does not yet use the module system directly, there are some side effects even for those projects not modularized (eg #73517). Before we can even begin to think about how to modularize, we must Prepare The Way by enforcing packages only exist in a single jar file, since the module system does not allow packages to coexist in multiple modules. This commit adds a precommit check to the build which detects split packages. The expectation is that we will add the existing split packages to the ignore list so that any new classes will not exacerbate the problem, and the work to cleanup these split packages can be parallelized. relates #73525	2021-06-08 15:04:23 -07:00
Alan Woodward	b27eaa38dc	Remove 'external values', and replace with swapped out XContentParsers (#72203 ) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063	2021-04-29 09:17:18 +01:00
Alan Woodward	e002aa809b	Make FieldNamesFieldMapper responsible for adding its own doc fields (#71929 ) The FieldNamesFieldMapper is a metadata mapper defining a field that can be used for exists queries if a mapper does not use doc values or norms. Currently, data is added to it via a special method on FieldMapper that pulls the metadata mapper from a mapping lookup, checks to see if it is enabled, and then adds the relevant value to a lucene document. This is one of only two places that pulls a metadata mapper from the MappingLookup, and it would be nice to remove this method. This commit refactors field name handling by instead storing the names of fields to index in the fieldnames field in a set on the ParseContext, and then building the field itself in FieldNamesFieldMapper.postParse(). This means that all of the responsibility for enabling indexing, etc, is handled within the metadata mapper itself.	2021-04-27 16:03:46 +01:00
Adrien Grand	25750a3696	Make intervals queries fully pluggable through field mappers. (#71429 ) `MappedFieldType` only allows configuring `match` and `prefix` queries today. This change makes it possible to configure how to create `wildcard` and `fuzzy` queries as well. This will allow making the upcoming `match_only_text` field fully support intervals queries.	2021-04-20 18:10:12 +02:00
Jake Landis	b1ef1fd800	Introduce yamlRestCompatTests for :plugins projects (#71440 )	2021-04-08 16:11:50 -05:00
Mark Vieira	6339691fe3	Consolidate REST API specifications and publish under Apache 2.0 license (#70036 )	2021-03-26 16:20:14 -07:00
Nik Everett	91c700bd99	Super randomized tests for fetch fields API (#70278 ) We've had a few bugs in the fields API where is doesn't behave like we'd expect. Typically this happens because it isn't obvious what we expct. So we'll try and use randomized testing to ferret out what we want. This adds a test for most field types that asserts that `fields` works similarly to `docvalues_fields`. We expect this to be true for most fields. It does so by forcing all subclasses of `MapperTestCase` to define a method that makes random values. It declares a few other hooks that subclasses can override to further randomize the test. We skip the test for a few field types that don't have doc values: * `annotated_text` * `completion` * `search_as_you_type` * `text` We should come up with some way to test these without doc values, even if it isn't as nice. But that is a problem for another time, I think. We skip the test for a few more types just because I wanted to cut this PR in half so we could get to reviewing it earlier. We'll get to those in a follow up change. I've filed a few bugs for things that are inconsistent with `docvalues_fields`. Typically that means that we have to limit the random values that we generate to those that do round trip properly.	2021-03-24 14:16:27 -04:00
Alan Woodward	49897be1bc	Fix position increment gap on phrase/prefix analyzers (#70096 ) Custom position increments are handled by wrapping analyzers with a NamedAnalyzer and passing the custom increment through to its constructor. However, phrase and prefix analyzers use delegating analyzer wrappers to add extra filtering to their parent analyzers, and we can't wrap analyzers multiple times because this wrecks reuse strategies, so we unwrap the parent before passing it to phrase and prefix builders. This unwrapping means that we lose the custom position increments; in particular, it means that we can end up with a position increment gap of -1, which is the sentinel value for the unset parameter - and that means exceptions at index time for backwards-moving positions on fields with multiple values. This commit removes the sentinel value and uses standard parameter defaults and the isConfigured() method instead, plus it adds some more comprehensive testing for position increments when combined with phrase/prefix index options on text fields. Fixes #70049	2021-03-09 12:10:13 +00:00
Marios Trivyzas	1e12c93a31	Fix issue with AnnotatedTextHighlighter and max_analyzed_offset (#69028 ) With the newly introduced `max_analyzed_offset` the analyzer of `AnnotatedTextHighlighter` was wrapped twice with the `LimitTokenOffsetAnalyzer` by mistake. Follows: #67325	2021-02-16 17:08:07 +01:00
Marios Trivyzas	f9af60bf69	Add query param to limit highlighting to specified length (#67325 ) Add a `max_analyzed_offset` query parameter to allow users to limit the highlighting of text fields to a value less than or equal to the `index.highlight.max_analyzed_offset`, thus avoiding an exception when the length of the text field exceeds the limit. The highlighting still takes place, but stops at the length defined by the new parameter. Closes: #52155	2021-02-16 09:25:45 +01:00
Luca Cavanna	0ca6819882	DocumentMapper to not implement ToXContent (#68653 ) DocumentMapper does not need to implement ToXContent, in fact it is its inner Mapping that needs to and already does. Consumers can switch to calling mapping() and toXContent against it.	2021-02-08 14:17:31 +01:00

1 2 3

105 Commits