elasticsearch

Commit Graph

Author	SHA1	Message	Date
Nikolay Vasiliev	16956a1a05	[DOCS] Clarify 'type' parameter meaning for custom analyzer (#34012 ) This pull request improves the docs on the meaning of type parameter on the custom analyzer doc page. Closes #33456	2018-09-25 15:32:27 +02:00
Alan Woodward	5107949402	Allow TokenFilterFactories to rewrite themselves against their preceding chain (#33702 ) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609	2018-09-19 15:52:14 +01:00
Alan Woodward	f598297f55	Add predicate_token_filter (#33431 ) This allows users to filter out tokens from a TokenStream using painless scripts, instead of having to write specialised Java code and packaging it up into a plugin. The commit also refactors the AnalysisPredicateScript.Token class so that it wraps and makes read-only an AttributeSource.	2018-09-11 09:16:39 +01:00
Jim Ferenczi	7ad71f906a	Upgrade to a Lucene 8 snapshot (#33310 ) The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly. Some comments about the change: * Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309 Closes #32899	2018-09-06 14:42:06 +02:00
Alan Woodward	636442700c	Add conditional token filter to elasticsearch (#31958 ) This allows tokenfilters to be applied selectively, depending on the status of the current token in the tokenstream. The filter takes a scripted predicate, and only applies its subfilter when the predicate returns true.	2018-09-05 14:52:43 +01:00
Matthias Sieber	a39f6f09f4	fixed elements in array of produced terms (#32519 )	2018-08-02 11:12:15 -04:00
Christoph Büscher	61486680a2	Add exclusion option to `keep_types` token filter (#32012 ) Currently the `keep_types` token filter includes all token types specified using its `types` parameter. Lucenes TypeTokenFilter also provides a second mode where instead of keeping the specified tokens (include) they are filtered out (exclude). This change exposes this option as a new `mode` parameter that can either take the values `include` (the default, if not specified) or `exclude`. Closes #29277	2018-07-17 09:04:41 +02:00
Sohaib Iftikhar	88c270d844	Added lenient flag for synonym token filter (#31484 ) * Added lenient flag for synonym-tokenfilter. Relates to #30968 * added docs for synonym-graph-tokenfilter -- Also made lenient final -- changed from !lenient to lenient == false * Changes after review (1) -- Renamed to ElasticsearchSynonymParser -- Added explanation for ElasticsearchSynonymParser::add method -- Changed ElasticsearchSynonymParser::logger instance to static * Added lenient option for WordnetSynonymParser -- also added more documentation * Added additional documentation * Improved documentation	2018-07-10 17:11:50 -04:00
Alan Woodward	5683bc60a6	Multiplexing token filter (#31208 ) The `multiplexer` filter emits multiple tokens at the same position, each version of the token haivng been passed through a different filter chain. Identical tokens at the same position are removed. This allows users to, for example, index lowercase and original-case tokens, or stemmed and unstemmed versions, in the same field, so that they can search for a stemmed term within x positions of an unstemmed term.	2018-06-20 10:16:26 +01:00
Alan Woodward	8c0ec05a12	Expose lucene's RemoveDuplicatesTokenFilter (#31275 )	2018-06-18 09:46:12 +01:00
Itamar Syn-Hershko	5f172b6795	[Feature] Adding a char_group tokenizer (#24186 ) === Char Group Tokenizer The `char_group` tokenizer breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of the <<analysis-pattern-tokenizer, `pattern` tokenizer>> is not acceptable. === Configuration The `char_group` tokenizer accepts one parameter: `tokenize_on_chars`:: A string containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. Also supports escaped values like `\\n` and `\\f`, and in addition `\\s` to represent whitespace, `\\d` to represent digits and `\\w` to represent letters. Defaults to an empty list. === Example output ```The 2 QUICK Brown-Foxes jumped over the lazy dog's bone for $2``` When the configuration `\\s-:<>` is used for `tokenize_on_chars`, the above sentence would produce the following terms: ```[ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog's, bone, for, $2 ]```	2018-05-22 16:26:31 +02:00
Jim Ferenczi	bdb79d021a	Fix docs failure on language analyzers (#30722 ) This commit fixes docs failure on language analyzers when compared to the built in analyzers. The `elision` filters used by the rebuilt language analyzers should be case insensitive to match the definition of the prebuilt analyzers. Closes #30557	2018-05-22 09:58:12 +02:00
Nik Everett	9881bfaea5	Docs: Document how to rebuild analyzers (#30498 ) Adds documentation for how to rebuild all the built in analyzers and tests for that documentation using the mechanism added in #29535. Closes #29499	2018-05-14 18:40:54 -04:00
Jason Tedor	4a4e3d70d5	Default to one shard (#30539 ) This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the split API in place they no longer need to resort only to reindexing. Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).	2018-05-14 12:22:35 -04:00
Nik Everett	f9dc86836d	Docs: Test examples that recreate lang analyzers (#29535 ) We have a pile of documentation describing how to rebuild the built in language analyzers and, previously, our documentation testing framework made sure that the examples successfully built an analyzer but they didn't assert that the analyzer built by the documentation matches the built in anlayzer. Unsuprisingly, some of the examples aren't quite right. This adds a mechanism that tests that the analyzers built by the docs. The mechanism is fairly simple and brutal but it seems to be working: build a hundred random unicode sequences and send them through the `_analyze` API with the rebuilt analyzer and then again through the built in analyzer. Then make sure both APIs return the same results. Each of these calls to `_anlayze` takes about 20ms on my laptop which seems fine.	2018-05-09 09:23:10 -04:00
Mayya Sharipova	34e95e5d50	[DOCS] Add supported token filters Update normalizers.asciidoc with the list of supported token filters Closes #28605	2018-02-13 14:10:25 -08:00
Jim Ferenczi	7c2bcf3953	Mark synonym_graph as beta in the docs (#28496 ) We do want to keep this functionality in the future and we provide support for it. This change is a first step towards replacing the `synonym` token filter with `synonym_graph`.	2018-02-02 16:33:48 +01:00
deepybee	48c8098e15	Fixed several typos in analyzers section (#28247 )	2018-01-18 08:51:53 +00:00
Adrien Grand	1b660821a2	Allow `_doc` as a type. (#27816 ) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751	2017-12-14 17:47:53 +01:00
Martijn van Groningen	442c3b8bcf	docs: fix link	2017-12-13 16:51:21 +01:00
Christoph Büscher	c4fe7d3f72	[Docs] add deprecation warning for `delimited_payload_filter` renaming	2017-12-04 10:22:05 +01:00
kel	4885acb048	Replace `delimited_payload_filter` by `delimited_payload` (#26625 ) The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is deprecated and should be replaced by `delimited_payload`. Closes #21978	2017-11-24 13:03:19 +01:00
Mayya Sharipova	148376c2c5	Add limits for ngram and shingle settings (#27211 ) * Add limits for ngram and shingle settings (#27211) Create index-level settings: max_ngram_diff - maximum allowed difference between max_gram and min_gram in NGramTokenFilter/NGramTokenizer. Default is 1. max_shingle_diff - maximum allowed difference between max_shingle_size and min_shingle_size in ShingleTokenFilter. Default is 3. Throw an IllegalArgumentException when trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter where difference between max_size and min_size exceeds the settings value. Closes #25887	2017-11-07 08:14:55 -05:00
Md. Abdulla-Al-Sun	a40c474e10	Added Bengali Analyzer to Elasticsearch with respect to the lucene update(PR#238)	2017-10-05 13:25:05 +02:00
markwalkom	dbea83a1d0	[Docs] Update length-tokenfilter.asciidoc (#26849 ) Made it clear what the numeric value of `Integer.MAX_VALUE` is,	2017-10-02 11:01:43 +02:00
olcbean	6952f7b560	Validate top-level keys for create index request (#23755 ) (#23869 ) This commit ensures create index requests do not ignore unknown keys passed to the request. closes #23755	2017-09-26 09:49:20 -07:00
Christoph Büscher	3827918417	Add configurable `maxTokenLength` parameter to whitespace tokenizer (#26749 ) Other tokenizers like the standard tokenizer allow overriding the default maximum token length of 255 using the `"max_token_length` parameter. This change enables using this parameter also with the whitespace tokenizer. The range that is currently allowed is from 0 to StandardTokenizer.MAX_TOKEN_LENGTH_LIMIT, which is 1024 * 1024 = 1048576 characters. Closes #26643	2017-09-25 17:21:19 +02:00
Tahmim Ahmed Shibli	34662c9e6d	[Docs] Fix name of character filter in example. (#26724 )	2017-09-20 17:08:43 +02:00
Christoph Büscher	254c1b28e9	[Docs] Clarify behaviour of Pattern Capture Token Filter during search (#26278 ) There was some confusion about the fact that tokens emitted from a Pattern Capture Token Filter are treated as synonyms when used to analyze a search query. This commit adds an explanation to the note in the docs to emphasize this behaviour. Closes #25746	2017-08-21 14:56:52 +02:00
Clinton Gormley	ff4a2519f2	Update experimental labels in the docs (#25727 ) Relates https://github.com/elastic/elasticsearch/issues/19798 Removed experimental label from: * Painless * Diversified Sampler Agg * Sampler Agg * Significant Terms Agg * Terms Agg document count error and execution_hint * Cardinality Agg precision_threshold * Pipeline Aggregations * index.shard.check_on_startup * index.store.type (added warning) * Preloading data into the file system cache * foreach ingest processor * Field caps API * Profile API Added experimental label to: * Moving Average Agg Prediction Changed experimental to beta for: * Adjacency matrix agg * Normalizers * Tasks API * Index sorting Labelled experimental in Lucene: * ICU plugin custom rules file * Flatten graph token filter * Synonym graph token filter * Word delimiter graph token filter * Simple pattern tokenizer * Simple pattern split tokenizer Replaced experimental label with warning that details may change in the future: * Analysis explain output format * Segments verbose output format * Percentile Agg compression and HDR Histogram * Percentile Rank Agg HDR Histogram	2017-07-18 14:06:22 +02:00
Neil Rickards	5189bd14f1	[Docs] Fix typo in pattern-tokenizer.asciidoc (#25626 )	2017-07-13 18:43:48 +02:00
Simon Willnauer	e81804cfa4	Add a shard filter search phase to pre-filter shards based on query rewriting (#25658 ) Today if we search across a large amount of shards we hit every shard. Yet, it's quite common to search across an index pattern for time based indices but filtering will exclude all results outside a certain time range ie. `now-3d`. While the search can potentially hit hundreds of shards the majority of the shards might yield 0 results since there is not document that is within this date range. Kibana for instance does this regularly but used `_field_stats` to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results. This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.	2017-07-12 22:19:20 +02:00
Jun Ohtani	62d1969595	Parse synonyms with the same analysis chain (#8049 ) * [Analysis] Parse synonyms with the same analysis chain Synonym Token Filter / Synonym Graph Filter tokenize synonyms with whatever tokenizer and token filters appear before it in the chain. Close #7199	2017-06-20 21:50:33 +09:00
Andy Bristol	4c5bd57619	Rename simple pattern tokenizers (#25300 ) Changed names to be snake case for consistency Related to #25159, original issue #23363	2017-06-19 13:48:43 -07:00
debadair	c161d90524	[DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc. (#25232 ) Use these attributes when specifying the location of included tests.	2017-06-15 08:54:10 -07:00
Adrien Grand	0c117145f6	Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222 ) This snapshot has faster range queries on range fields (LUCENE-7828), more accurate norms (LUCENE-7730) and the ability to use fake term frequencies (LUCENE-7854).	2017-06-15 09:52:07 +02:00
Andy Bristol	48696ab544	expose simple pattern tokenizers (#25159 ) Expose the experimental simplepattern and simplepatternsplit tokenizers in the common analysis plugin. They provide tokenization based on regular expressions, using Lucene's deterministic regex implementation that is usually faster than Java's and has protections against creating too-deep stacks during matching. Both have a not-very-useful default pattern of the empty string because all tokenizer factories must be able to be instantiated at index creation time. They should always be configured by the user in practice.	2017-06-13 12:46:59 -07:00
Jim Ferenczi	2508df6cc8	Add missing link for the WordDelimiterGraphFilter	2017-04-28 17:12:38 +02:00
Adrien Grand	1be2800120	Only allow one type on 7.0 indices (#24317 ) This adds the `index.mapping.single_type` setting, which enforces that indices have at most one type when it is true. The default value is true for 6.0+ indices and false for old indices. Relates #15613	2017-04-27 08:43:20 +02:00
Nik Everett	ad69503dce	CONSOLEify analysis docs Converts the analysis docs to that were marked as json into `CONSOLE` format. A few of them were in yaml but marked as json for historical reasons. I added more complete examples for a few of the less obvious sounding ones. Relates to #18160	2017-04-02 11:17:14 -04:00
Nik Everett	514187be8e	Fix language in some docs The pattern-analyzer docs contained a snippet that was an expanded regex that was marked as `[source,js]`. This changes it to `[source,regex]`. The htmlstrip-charfilter and pattern-replace-charfilter docs had examples that were actually a list of tokens but marked `[source,js]`. This marks them as `[source,text]` so they don't count as unconverted CONSOLE snippets. The pattern-replace-charfilter also had a doc who's test was skipped because of funny interaction with the test framework. This fixes the test. Three more down, eighty-two to go. Relates to #18160	2017-04-01 14:45:44 -04:00
Nik Everett	9baa48a928	CONSOLEify lang-analyzer docs CONSOLEifies the lang-analyzer docs and replaces the (invalid) empty `keyword_marker` setups that were on the page with one that contains the word "example" translated into the appropriate language. Relates to #18160	2017-04-01 14:21:58 -04:00
Abdon Pijpelink	ef1329727d	Update compound-word-tokenfilter.asciidoc (#23817 ) Updated URL to OFFO Sourceforge project	2017-03-30 12:27:32 +02:00
Ali Beyad	2120086d82	Adds pattern keyword marker filter support (#23600 ) This commit adds support for the pattern keyword marker filter in Lucene. Previously, the keyword marker filter in Elasticsearch supported specifying a keywords set or a path to a set of keywords. This commit exposes the regular expression pattern based keyword marker filter also available in Lucene, so that any token matching the pattern specified by the `keywords_pattern` setting is excluded from being stemmed by any stemming filters. Closes #4877	2017-03-28 11:13:34 -04:00
Nik Everett	a783c6c85c	CONSOLEify some more docs And expand on the `stemmer_override` examples, including the file on disk and an example of specifying the rules inline. Relates to #18160	2017-03-22 17:58:06 -04:00
Nik Everett	e860fe7363	CONSOLEify some more docs Relates to #18160	2017-03-22 17:15:14 -04:00
Nik Everett	1dee2f32a4	Docs: CONSOLEify synonym tokenfiler docs Relates to #18160	2017-03-22 16:30:52 -04:00
Nik Everett	1c1b29400b	Docs: Fix language on a few snippets They aren't `js`, they are their own thing. Relates to #18160	2017-03-22 15:57:28 -04:00
Jim Ferenczi	63bdd01eb7	Expose WordDelimiterGraphTokenFilter (#23327 ) This change exposes the new Lucene graph based word delimiter token filter in the analysis filters. Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time. Closes #23104	2017-02-24 00:53:38 +01:00
markwalkom	ced99dde50	Update stop-analyzer.asciidoc (#23195 ) Clarified where the stopwords file needs to live	2017-02-16 13:36:15 +01:00
Adrien Grand	f3509b8003	Consolify docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc. (#23050 )	2017-02-13 11:00:12 +01:00
Clinton Gormley	f5e7c25e24	Update normalizers.asciidoc analyzers -> normalizers	2017-02-07 12:09:39 +01:00
Shubham Aggarwal	e07e4cc4dd	Fix incorrect heading for Whitespace Tokenizer (#22883 )	2017-01-31 12:51:37 +01:00
Daniel Mitterdorfer	aece89d6a1	Make boolean conversion strict (#22200 ) This PR removes all leniency in the conversion of Strings to booleans: "true" is converted to the boolean value `true`, "false" is converted to the boolean value `false`. Everything else raises an error.	2017-01-19 07:59:18 +01:00
Michael McCandless	1d1bdd476c	Finish exposing FlattenGraphTokenFilter (#22667 )	2017-01-18 11:05:34 -05:00
Clinton Gormley	519a9c469d	Update truncate token filter to not mention the keyword tokenizer The advice predates the existence of the keyword field Closes #22650	2017-01-17 12:15:22 +01:00
Matt Weber	609d2aab15	QueryString and SimpleQueryString Graph Support (#22541 ) Add support for graph token streams to "query_String" and "simple_query_string" queries.	2017-01-11 18:59:43 +01:00
Achraf	5dc85c25d9	Hindu-Arabico-Latino Numerals (#22476 ) Hi, same edit as for : https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html	2017-01-10 15:24:56 +01:00
Adrien Grand	3f805d68cb	Add the ability to set an analyzer on keyword fields. (#21919 ) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064	2016-12-30 09:36:10 +01:00
Francesc Gil	dec6fc2d40	Repeated language analyzers (#22240 ) * Repeated language analyzers The `catalan` analyzer was repeated on the supported list :) * Reordered the languages to have alphabetic order * Added space for format * Reordered the languages and removed repeated	2016-12-21 17:32:02 +01:00
Thibault Pierre	e494d6a94e	Fix wrong link (#22019 )	2016-12-07 17:58:46 +01:00
Allen Torres	887fbb6387	Update lowercase-tokenizer.asciidoc (#21896 ) Fixed typo	2016-12-02 10:49:51 -05:00
Matt Weber	04e07bcdb6	Synonym Graph Support (LUCENE-6664) (#21517 ) Integrate the patch from LUCENE-6664 into elasticsearch and add support for handling a graph token stream in match/multi-match queries. This fixes longstanding bugs with multi-token synonyms returning incorrect results with proximity queries.	2016-11-28 09:25:49 -08:00
Achraf	d81a928b1f	Correction of the names of numirals (#21531 ) What was called Arabic numerals is actually Hindu - Eastern Arabic notation. And the Latin numerals you refer to is the Arabic numbers.	2016-11-25 14:30:49 +01:00
Pascal Borreli	fcb01deb34	Fixed typos (#20843 )	2016-10-10 14:51:47 -06:00
Clinton Gormley	22f1acde94	Docs: Pattern analyzer does not support a max_token_length parameter Closes #20713	2016-10-08 12:27:33 +02:00
Alexander Lin	7cd0316b51	Fix minhash docs level Relates #20547	2016-09-19 07:54:04 -04:00
Clinton Gormley	2f6d0119f1	Added warning messages about the dangers of pathological regexes to: * pattern-replace charfilter * pattern-capture and pattern-replace token filters * pattern tokenizer * pattern analyzer Relates to #20038	2016-09-09 09:53:07 +02:00
Alexander Lin	f825e8f4cb	Exposing lucene 6.x minhash filter. (#20206 ) Exposing lucene 6.x minhash tokenfilter Generate min hash tokens from an incoming stream of tokens that can be used to estimate document similarity. Closes #20149	2016-09-07 09:38:12 +02:00
Jim Ferenczi	4682fc34ae	Add the ability to disable the retrieval of the stored fields entirely This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation. To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this: ```` POST _search { "stored_fields": "_none_" } ````	2016-08-24 16:40:08 +02:00
markwalkom	f556424ab9	Update synonym-tokenfilter.asciidoc (#19988 ) * Update synonym-tokenfilter.asciidoc * Update synonym-tokenfilter.asciidoc	2016-08-17 13:39:22 +02:00
Nik Everett	7aeea764ba	Remove wait_for_status=yellow from the docs It is no longer required after `687e2e12b3`.	2016-07-15 16:02:07 -04:00
Clinton Gormley	6f17736eb1	Fixed asciidoc	2016-07-15 12:58:38 +02:00
Jim Ferenczi	881afcba60	Fixed tests that failed now that BM25 is the default similarity.	2016-06-21 15:42:42 +02:00
Nik Everett	a0585269be	[docs] s/lags/Flags/ Copy and paste lots an `F`.	2016-06-09 13:08:53 -04:00
Nik Everett	09cc4c449a	[docs] Pattern replace char filter now support flags	2016-06-09 12:41:20 -04:00
Clinton Gormley	5da9e5dcbc	Docs: Improved tokenizer docs (#18356 ) * Docs: Improved tokenizer docs Added descriptions and runnable examples * Addressed Nik's comments * Added TESTRESPONSEs for all tokenizer examples * Added TESTRESPONSEs for all analyzer examples too * Added docs, examples, and TESTRESPONSES for character filters * Skipping two tests: One interprets "$1" as a stack variable - same problem exists with the REST tests The other because the "took" value is always different * Fixed tests with "took" * Fixed failing tests and removed preserve_original from fingerprint analyzer	2016-05-19 19:42:23 +02:00
Nik Everett	8155e1efda	[docs] Add wait_for_status=yellow Another unstable snippet.... https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-os-compatibility/os=sles/402/console	2016-05-12 17:53:34 -04:00
Zachary Tong	5ee5cc25cc	Move AsciiFolding earlier in FingerprintAnalyzer filter chain Rearranges the FingerprintAnalyzer so that AsciiFolding comes earlier in the chain (after lowercasing, before stop removal, for maximum deduping power) Closes #18266	2016-05-12 09:34:15 -04:00
Clinton Gormley	97a41ee973	First pass at improving analyzer docs (#18269 ) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos	2016-05-11 14:17:56 +02:00
Clinton Gormley	3f594089c2	Renamed all AUTOSENSE snippets to CONSOLE (#18210 )	2016-05-09 15:42:23 +02:00
Nik Everett	3912761572	[docs] Add wait_until_yellow to fix build failure The snippet in the docs creates and index and uses it with the _analyze api. The trouble is that if the index hasn't been created fully the _analyze API will fail. This adds a GET _cluster/health?wait_for_status=yellow which fixes the issue. While this does make the docs more cluttered, it also makes the snippets actually runnable. Closes #18165	2016-05-05 16:02:00 -04:00
Nik Everett	4b1c116461	Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start.	2016-05-05 13:58:03 -04:00
Zachary Tong	80288ad60c	Add `fingerprint` token filter and `fingerprint` analyzer Adds a `fingerprint` token filter which uses Lucene's FingerprintFilter, and a `fingerprint` analyzer that combines the Fingerprint filter with lowercasing, stop word removal and asciifolding. Closes #13325	2016-04-20 16:10:56 -04:00
Clinton Gormley	a62b9296c6	Docs: Fixed link to phonetic plugin	2016-04-13 10:17:46 +02:00
Adrien Grand	b42f66c8ac	Document 5.0 mapping changes.	2016-03-22 16:22:58 +01:00
Clinton Gormley	dc21ab7576	Docs: Corrected behaviour of max_token_length in standard tokenizer	2016-03-18 10:58:16 +01:00
Clinton Gormley	a5a9bbfe88	Update compound-word-tokenfilter.asciidoc Only FOP v1.2 compatible hyphenation files are supported by the hyphenation decompounder	2016-03-11 15:08:36 +01:00
Lee Hinman	6adbbff97c	Fix organization rename in all files in project Basically a query-replace of "https://github.com/elasticsearch/" with "https://github.com/elastic/"	2016-03-03 12:04:13 -07:00
Andrey Ryaguzov	f744c3f724	Docs: Added migration description for custom analysis file path Closes #15597 Closes #15556	2016-02-29 20:56:19 +01:00
Dongjoon Hyun	21ea552070	Fix typos in docs.	2016-02-09 02:07:32 -08:00
Adrien Grand	f8e802c028	Merge pull request #15794 from damienalexandre/french-doc [Doc] Fix french analyzer elision token filter doc	2016-01-06 18:39:26 +01:00
Damien Alexandre	23a64f8214	Fix french analyzer elision token filter doc Fix #15774	2016-01-06 18:26:03 +01:00
David Pilato	995e796eab	[doc] Fix cross link with ICU plugin Doc bug introduced with #15695	2015-12-30 12:07:33 +01:00
David Pilato	3076377fdb	Remove ICU Plugin in reference guide This documentation lives now in plugins documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html. We don't need a copy in analysis reference guide.	2015-12-29 11:23:28 +01:00
socurites	485915bbe7	comma(,) was duplicated deleted it.	2015-12-24 14:31:26 +01:00
socurites	25d23091e2	Edge NGram: "side" setting was depercated Edge NGram: "side" setting was depercated	2015-12-24 14:26:24 +01:00
Jason Tedor	d9a24961c5	Fix minor issues in delimited payload token filter docs This commit addresses a few minor issues in the delimited payload token filter docs: - the provided example reversed the payloads associated with the tokens "the" and "fox" - two additional typos in the same sentence - "per default" -> "by default" - "default int to" -> "default into" - adds two serial commas	2015-12-16 13:00:20 -05:00
tomoya yokota	82d26c852a	property name is not right `ignore_script` is not right. `ignored_script' is right. See org.elasticsearch.index.analysis.CJKBigramFilterFactory	2015-11-26 14:22:23 +09:00
Clinton Gormley	98028419a5	Merge pull request #14610 from yokotaso/patch-1 Update snowball document page.	2015-11-17 14:17:30 +01:00
Jason O'Donnell	42fb690a1c	Fixing typo	2015-10-26 16:46:36 -04:00
Adrien Grand	d3aa3565db	Deprecate `index.analysis.analyzer.default_index` in favor of `index.analysis.analyzer.default`. Close #11861	2015-10-12 22:19:16 +02:00
Clinton Gormley	1f76f49003	Update compound-word-tokenfilter.asciidoc Improved the docs for compound work token filter. Closes #13670 Closes #13595	2015-09-21 11:22:14 +02:00
Robert Muir	f216d92d19	Upgrade to lucene 5.4-snapshot r1701068	2015-09-03 15:13:33 -04:00
Robert Muir	0d3e3f81fc	Lithuanian analysis	2015-09-01 08:52:10 -04:00
xuzha	fb2be6d6a1	The name "position_offset_gap" is confusing because Lucene has three similar sounding things: * Analyzer#getPositionIncrementGap * Analyzer#getOffsetGap * IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and * FieldType#storeTermVectorOffsets Rename position_offset_gap to position_increment_gap closes #13056	2015-08-26 14:56:35 -07:00
Nik Everett	4b9664beeb	Mapping: Default position_offset_gap to 100 This is much more fiddly than you'd expect it to be because of the way position_offset_gap is applied in StringFieldMapper. Instead of setting the default to 100 its simpler to make sure that all the analyzers default to 100 and that StringFieldMapper doesn't override the default unless the user specifies something different. Unless the index was created before 2.1, in which case the old default of 0 has to take. Also postition_offset_gaps less than 0 aren't allowed at all. New tests test that: 1. the new default doesn't match phrases across values with reasonably low slop (5) 2. the new default doest match phrases across values with reasonably high slop (50) 3. you can override the value and phrases work as you'd expect 4. if you leave the value undefined in the mapping and define it on a custom analyzer the the value from the custom analyzer shines through Closes #7268	2015-08-25 14:21:50 -04:00
Clinton Gormley	2b512f1f29	Docs: Use "js" instead of "json" and "sh" instead of "shell" for source highlighting	2015-07-14 18:14:09 +02:00
Britta Weber	eeeb29f900	spell correct and add single quotes	2015-05-26 11:41:19 +02:00
Britta Weber	37782c1745	analyzers: custom analyzers names and aliases must not start with _ closes #9596	2015-05-26 11:38:15 +02:00
Clinton Gormley	3a69b65e88	Docs: Fixed the backslash escaping on the pattern analyzer docs Closes #11099	2015-05-15 18:40:16 +02:00
Ryan Ernst	ba68d354c4	Merge pull request #10934 from mattweber/custom_analyzer_pos_offset_gap document and test custom analyzer position offset gap	2015-05-04 08:56:50 -07:00
Matt Weber	63c4a214db	document and test custom analyzer position offset gap	2015-05-04 08:53:45 -07:00
Robert Muir	4b3672b7df	Add migration note for hunspell dictionaries	2015-05-04 10:00:05 -04:00
Clinton Gormley	cf177c32d4	Docs: Fixed pattern-capture token filter example Closes #10690	2015-04-25 19:27:55 +02:00
Benoit Delbosc	4a94e1f14b	Docs: Warning about the conflict with the Standard Tokenizer The examples given requires a specific Tokenizer to work. Closes: 10645	2015-04-23 21:16:30 +02:00
Glen Smith	3d5fbfb997	Docs: Update pattern-replace-charfilter.asciidoc Remove invalid trailing comma from json Closes #9477	2015-01-29 20:24:08 +01:00
Lee Hinman	2f6527f491	[DOCS] Update documentation for `max_token_length` In 1.4 the behavior is different due to https://issues.apache.org/jira/browse/LUCENE-5897	2015-01-27 13:52:14 -07:00
David Haney	395960feef	Docs: Updated standard token filter docs to indicate true behavior: doing nothing Closes #9300	2015-01-15 21:33:29 +01:00
Tomoya Hirano	15d46988dc	Fix typo in sample json Fixes #9253	2015-01-12 15:58:16 +00:00
Michael McCandless	dfb6d6081c	Core: upgrade to current Lucene 5.0.0 snapshot Elasticsearch no longer unlocks the Lucene index on startup (this was dangerous, and could possibly lead to corruption). Added the new serbian_normalization TokenFilter from Lucene. NoLockFactory is no longer supported (index.store.fs.fs_lock = none), and if you have a typo in your fs_lock you'll now hit a StoreException instead of silently using NoLockFactory. Closes #8588	2014-11-24 05:08:42 -05:00
Robert Muir	610ce078fb	Upgrade master to lucene 5.0 snapshot This has a lot of improvements in lucene, particularly around memory usage, merging, safety, compressed bitsets, etc. On the elasticsearch side, summary of the larger changes: API changes: postings API became a "pull" rather than "push", collector API became per-segment, etc. packaging changes: add lucene-backwards-codecs.jar as a dependency. improvements to boolean filtering: especially ensuring it will not be slow for SparseBitSet. use generic BitSet api in plumbing so that concrete bitset type is an implementation detail. use generic BitDocIdSetFilter api for dedicated bitset cache, so there is type safety. changes to support atomic commits implement Accountable.getChildResources (detailed memory usage API) for fielddata, etc change handling of IndexFormatTooOld/New, since they no longer extends CorruptIndexException Closes #8347. Squashed commit of the following: commit `d90d53f5f2` Author: Simon Willnauer <simonw@apache.org> Date: Wed Nov 5 21:35:28 2014 +0100 Make default codec/postings/docvalues format constants commit `cb66c22c71` Merge: `d4e2f6d` `ad4ff43` Author: Robert Muir <rmuir@apache.org> Date: Wed Nov 5 11:41:13 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `d4e2f6dfe7` Merge: `4e5445c` `4111d93` Author: Robert Muir <rmuir@apache.org> Date: Wed Nov 5 06:26:32 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `4e5445c775` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 16:19:19 2014 -0500 FixedBitSet -> BitSet commit `9887ea73e8` Merge: `1bf8894` `fc84666` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 15:26:25 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `1bf8894430` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 15:22:51 2014 -0500 remove nocommit commit `a9c2a2259f` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 13:48:43 2014 -0500 turn jenkins red again commit `067baaaa4d` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 13:18:21 2014 -0500 unzip from stream commit `82b6fba33d` Merge: b2214bb `6523cd9` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 13:10:59 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `b2214bb093` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 13:09:53 2014 -0500 go back to my URL until we can figure out what is up with jenkins commit `e7d6141722` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 10:52:54 2014 -0500 try this jenkins commit `337a3c7704` Author: Simon Willnauer <simonw@apache.org> Date: Tue Nov 4 16:17:49 2014 +0100 Rename temp-files under lock to prevent metadata reads while renaming commit `77d5ba80d0` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 10:07:11 2014 -0500 continue to treat too-old/too-new as corruption for now commit `98d0fd2f48` Author: Robert Muir <rmuir@apache.org> Date: Tue Nov 4 09:24:21 2014 -0500 fix last nocommit commit `643fceed66` Author: Simon Willnauer <simonw@apache.org> Date: Tue Nov 4 14:46:17 2014 +0100 remove NoSuchDirectoryException commit `2e43c4feba` Merge: `93826e4` `8163107` Author: Simon Willnauer <simonw@apache.org> Date: Tue Nov 4 14:38:00 2014 +0100 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `93826e4d56` Merge: `7f10129` `44e24d3` Author: Simon Willnauer <simonw@apache.org> Date: Tue Nov 4 12:54:27 2014 +0100 Merge branch 'master' into enhancement/lucene_5_0_upgrade Conflicts: src/main/java/org/elasticsearch/index/store/DistributorDirectory.java src/main/java/org/elasticsearch/index/store/Store.java src/main/java/org/elasticsearch/indices/recovery/RecoveryStatus.java src/test/java/org/elasticsearch/index/store/DistributorDirectoryTest.java src/test/java/org/elasticsearch/index/store/StoreTest.java src/test/java/org/elasticsearch/indices/recovery/RecoveryStatusTests.java commit `7f10129364` Author: Adrien Grand <jpountz@gmail.com> Date: Tue Nov 4 11:32:24 2014 +0100 Fix TopHitsAggregator to not ignore the top-level/leaf collector split. commit `042fadc860` Author: Adrien Grand <jpountz@gmail.com> Date: Tue Nov 4 11:31:20 2014 +0100 Remove MatchDocIdSet in favor of DocValuesDocIdSet. commit `7d877581ff` Author: Adrien Grand <jpountz@gmail.com> Date: Tue Nov 4 11:10:08 2014 +0100 Make the and filter use the cost API. Lucene 5 ensured that cost() can safely be used, and this will have the benefit that the order in which filters are specified is not important anymore (only for slow random-access filters in practice). commit `78f1718aa2` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 23:55:17 2014 -0500 fix previous eclipse import braindamage commit `186c40e925` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 22:32:34 2014 -0500 allow child queries to exhaust iterators again commit `b0b1271305` Author: Ryan Ernst <ryan@iernst.net> Date: Mon Nov 3 14:50:44 2014 -0800 Fix nocommit for mapping output. index_options will not be printed if the field is not indexed. commit `ba223eb85e` Author: Ryan Ernst <ryan@iernst.net> Date: Mon Nov 3 14:07:26 2014 -0800 Remove no commit for chinese analyzer provider. We should have a separate issue to address not using this provider on new indexes. commit `ca554b03c4` Author: Ryan Ernst <ryan@iernst.net> Date: Mon Nov 3 13:41:59 2014 -0800 Fix stop tests commit `de67c4653e` Author: Ryan Ernst <ryan@iernst.net> Date: Mon Nov 3 12:51:17 2014 -0800 Remove analysis nocommits, switching over to Lucene43*Filters for backcompat commit `50cae9bec7` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 15:32:25 2014 -0500 add ram accounting and TODO lazy-loading (its no worse than master, can be a followup improvement) for suggesters commit `7a7f0122f1` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 15:11:26 2014 -0500 bump lucene version commit `cd0cae5c35` Merge: `446bc09` `3c72073` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 14:49:05 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `446bc09b4e` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 14:46:30 2014 -0500 remove hack commit `a19d85a968` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 12:53:11 2014 -0500 dont create exceptions with circular references on corruption (will open a PR for this) commit `0beefb9e82` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 11:47:14 2014 -0500 temporarily add craptastic detector for this horrible bug commit `e9f2d298bf` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 10:56:01 2014 -0500 add nocommit commit `e97f1d50a9` Merge: `c57a3c8` `f1f50ac` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 10:12:12 2014 -0500 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `c57a3c8341` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 10:11:46 2014 -0500 fix nocommit commit `dd0e77e4ec` Author: Robert Muir <rmuir@apache.org> Date: Mon Nov 3 09:54:09 2014 -0500 nocommit -> TODO, this is in much more places in the codebase, bigger issue commit `3cc3bf56d7` Author: Ryan Ernst <ryan@iernst.net> Date: Sat Nov 1 23:59:17 2014 -0700 Remove nocommit and awaitsfix for edge ngram filter test. commit `89f1152451` Author: Ryan Ernst <ryan@iernst.net> Date: Sat Nov 1 23:57:44 2014 -0700 Fix EdgeNGramTokenFilter logic for version <= 4.3, and fixed instanceof checks in corresponding tests to correctly check for reverse filter when applicable. commit `112df869cd` Author: Robert Muir <rmuir@apache.org> Date: Sun Nov 2 00:08:30 2014 -0400 execute geo disjoint query/filter as intersects commit `e5061273cc` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 22:58:59 2014 -0400 remove chinese analyzer from docs commit `ea1af11b89` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 22:29:00 2014 -0400 fix ram accounting bug commit `53c0a42c6a` Merge: `e3bcd3c` `6011a18` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 22:16:29 2014 -0400 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `e3bcd3cc07` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 22:15:01 2014 -0400 fix url-email back compat (thanks ryan) commit `91d6b096a9` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 22:11:26 2014 -0400 bump lucene version commit `d2bb9568df` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 20:33:07 2014 -0400 remove nocommit commit `1d049c471e` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 20:28:58 2014 -0400 fix eclipse to group org/com imports together: without this, its madness commit `09d8c1585e` Author: Robert Muir <rmuir@apache.org> Date: Sat Nov 1 14:27:41 2014 -0400 remove nocommit, if you dont liek it, print assembly and tell me how it can be better commit `8a6a294313` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 20:01:55 2014 +0100 Remove deprecated usage of DocIdSets.newDocIDSet. commit `601bee6054` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 14:13:18 2014 -0400 maybe one of these zillions of annotations will stop thread leaks commit `9d3f69abc7` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 14:05:39 2014 -0400 fix some analysis nocommits commit `312e3a29c7` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 18:28:45 2014 +0100 Remove XConstantScoreQuery/XFilteredQuery/ApplyAcceptedDocsFilter. commit `5a0cb9f8e1` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 17:06:45 2014 +0100 Fix misleading documentation of DocIdSets.toCacheable. commit `8b4ef2b5b4` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 17:05:59 2014 +0100 Fix CustomRandomAccessFilterStrategy to override the right method. commit `d7a9a407a6` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 16:21:35 2014 +0100 Better handle the special case when there is a single SHOULD clause. commit `648ad389f0` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 15:53:38 2014 +0100 Cut over XBooleanFilter to BitDocIdSet.Builder. The idea is similar to what happened to Lucene's BooleanFilter. Yet XBooleanFilter is a bit more sophisticated and I had to slightly change the way it is implemented in order to make it work. The main difference with before is that slow filters are now applied lazily, so eg. if you have 3 MUST clauses, two with a fast iterator and the third with a slow iterator, the previous implementation used to apply the fast iterators first and then only check the slow filter for bits which were set in the bit set. Now we are computing a bit set based on the fast must clauses and then basically returning a BitsFilteredDocIdSet.wrap(bitset, slowClause). Other than that, BooleanFilter still uses the bitset optimizations when or-ing and and-ind filters. Another improvement is that BooleanFilter is now aware of the cost API. commit `b2dad312b4` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 10:18:53 2014 -0400 clear nocommit commit `4851d2091e` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 15:15:16 2014 +0100 cut over to RoaringDocIdSet commit `ca6aec24a9` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:57:30 2014 +0100 make nocommit more explicit commit `d0742ee2cb` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 09:55:24 2014 -0400 fix standardtokenizer nocommit commit `7d6faccaff` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:54:08 2014 +0100 fix compilation commit `a038a405c1` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:53:43 2014 +0100 fix compilation commit `30c9e307b1` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:52:35 2014 +0100 fix compilation commit `e5139bc5a0` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 09:52:16 2014 -0400 clear nocommit here commit `85dd2cedf7` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:46:17 2014 +0100 fix CompletionPostingsFormatTest commit `c0f3781f61` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 09:38:00 2014 -0400 add tests for these analyzers commit `51f9999b4a` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 14:10:26 2014 +0100 remove nocommit - this is not an issue commit `fd1388fa03` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Fri Oct 31 14:07:01 2014 +0100 Remove redundant null check commit `3d6dd51b09` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Fri Oct 31 14:01:37 2014 +0100 Removed the work around to prevent p/c error when invoking #iterator() twice, because the custom query filter wrapper now doesn't transform the result to a cache doc id set any more. I think the transforming to a cachable doc id set in CustomQueryWrappingFilter isn't needed at all, because we use the DocIdSet only once and because of that is just slowed things down. commit `821832a537` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 13:54:33 2014 +0100 one more nocommit commit `77eb9ea4c4` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Fri Oct 31 13:52:29 2014 +0100 Remove cast commit `a400573c03` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 13:49:24 2014 +0100 fix stop filter commit `51746087cf` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 13:21:36 2014 +0100 fix changed semantics of FBS.nextSetBit to check for NO_MORE_DOCS commit `8d0a4e2511` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 08:13:44 2014 -0400 do the bogus cast differently commit `46a5cc5732` Author: Simon Willnauer <simonw@apache.org> Date: Fri Oct 31 13:00:16 2014 +0100 I hate it but P/C now passes commit `580c0c2f82` Merge: `a9d3c00` `1645434` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 31 06:54:31 2014 -0400 fix nocommit/classcast commit `a9d3c004d6` Author: Adrien Grand <jpountz@gmail.com> Date: Fri Oct 31 08:49:31 2014 +0100 Update TODO. commit `aa75af0b40` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 19:18:25 2014 -0400 clear obselete nocommits from lucene bump commit `d438534cf4` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 18:53:20 2014 -0400 throw classcastexception when ES abuses regular filtercache for nested docs commit `2c751f3a8f` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 18:31:34 2014 -0400 bump lucene revision, fix tests commit `d6ef7f6304` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 22:37:58 2014 +0100 fix merge problems commit `de9d361f88` Merge: `41f6aab` `f6b37a3` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 22:28:59 2014 +0100 Merge branch 'master' into enhancement/lucene_5_0_upgrade Conflicts: pom.xml src/main/java/org/elasticsearch/Version.java src/main/java/org/elasticsearch/gateway/local/state/meta/MetaDataStateFormat.java commit `41f6aab388` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 17:48:46 2014 +0100 fix potiential NPE commit `c4428b12e1` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 17:38:46 2014 +0100 don't advance iterator in a match(doc) method commit `28ab948e99` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 17:34:58 2014 +0100 don't advance iterator in a match(doc) method commit `eb0f33f663` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 16:55:54 2014 +0100 fix GeoUtilsTest commit `7f711fe3ea` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 16:43:16 2014 +0100 Use a dedicated default index option if field type is not indexed by default commit `78e3f37ab7` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 10:56:14 2014 -0400 disable this test with AwaitsFix to reduce noise commit `9a590f563c` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 09:38:49 2014 +0100 fix lucene version commit `abe3ca1d8b` Author: Simon Willnauer <simonw@apache.org> Date: Thu Oct 30 09:35:05 2014 +0100 fix AnalyzingCompletionLookupProvider to wrok with new codec API commit `464293b245` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 00:26:00 2014 -0400 don't try to write stuff to tests class directory commit `031cc6c19f` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 00:12:36 2014 -0400 AwaitsFix these known issues to reduce noise commit `4600d51891` Author: Robert Muir <rmuir@apache.org> Date: Thu Oct 30 00:06:53 2014 -0400 openbitset lives on commit `8492bae056` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 23:42:54 2014 -0400 fixes for filter tests commit `31f24ce4ef` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 23:12:38 2014 -0400 don't use fieldcache commit `8480789942` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 23:04:29 2014 -0400 ancient index no longer supported commit `02e78dc7eb` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 23:37:02 2014 +0100 fix more tests commit `ff746c6df2` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 23:08:19 2014 +0100 fix all mapper commit `e4fb84b517` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 22:55:54 2014 +0100 fix distributor tests and cut over to FileStore API commit `20c850e2cf` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 22:42:18 2014 +0100 use DOCS_ONLY if index=true and current options == null commit `44169c1084` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 22:33:36 2014 +0100 Fix index=yes\|no settings in mappers commit `a3c5f77987` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 21:51:41 2014 +0100 fix several field mappers conversion from setIndexed to indexOptions commit `df84d73690` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 21:33:35 2014 +0100 fix SourceFieldMapper to be not indexed commit `b2bf01d12a` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 21:23:08 2014 +0100 Cut over to .liv files in store and corruption tests commit `619004df43` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 17:05:52 2014 +0100 fix more tests commit `b7ed653a8b` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 16:19:08 2014 +0100 [STORE] Add dedicated method to write temporary files Recovery writes temporary files which might not end up in the right distributor directories today. This commit adds a dedicated API that allows specifying the target file name in order to create the tempoary file in the correct directory. commit `7d574659f6` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 10:28:49 2014 -0400 add some leniency to temporary bogus method commit `f97022ea7c` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 10:24:17 2014 -0400 fix MultiCollector bug commit `b760533128` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:56:08 2014 +0100 CheckIndex is now closeable we need to close it commit `9dae9fb6d6` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:45:11 2014 +0100 s/Lucene51/Lucene50 commit `7aea9b8685` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:42:30 2014 +0100 fix BloomFilterPostingsFormat commit `16fea6fe84` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:41:16 2014 +0100 fix some codec format issues commit `3d77aa97dd` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:30:43 2014 +0100 fix CodecTests commit `6ef823b1fd` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:26:47 2014 +0100 make it compile commit `9991eee1fe` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 09:12:43 2014 -0400 add an ugly hack for TopHitsAggregator for now commit `03e768a01f` Author: Simon Willnauer <simonw@apache.org> Date: Wed Oct 29 14:01:02 2014 +0100 cut over ES090PostingsFormat commit `463d281faa` Merge: `0f8740a` `8eac79c` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 08:30:36 2014 -0400 Merge branch 'master' into enhancement/lucene_5_0_upgrade commit `0f8740a782` Author: Robert Muir <rmuir@apache.org> Date: Wed Oct 29 01:00:15 2014 -0400 fix/hack remaining filter and analysis issues commit `df53448856` Author: Robert Muir <rmuir@apache.org> Date: Tue Oct 28 23:11:47 2014 -0400 fix ngrams / openbitset usage commit `11f5dc3b98` Author: Robert Muir <rmuir@apache.org> Date: Tue Oct 28 22:42:44 2014 -0400 hack over sort comparators commit `4ebdc75435` Author: Robert Muir <rmuir@apache.org> Date: Tue Oct 28 21:27:07 2014 -0400 compiler errors < 100 commit `2d60c9e29d` Author: Robert Muir <rmuir@apache.org> Date: Tue Oct 28 03:13:08 2014 -0400 clear some nocommits around ram usage commit `aaf47fe6c0` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 12:27:34 2014 -0400 migrate fieldinfo handling commit `ef6ed6d15d` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 12:07:13 2014 -0400 more simple fixes commit `f475e1048a` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 11:58:21 2014 -0400 more fielddata ram accounting fixes commit `16b4239eaa` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 16:47:32 2014 +0100 add missing file commit `5b542fa2a6` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 16:43:29 2014 +0100 cut over completion posting formats - still some nocommits commit `ecdea49404` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 11:21:09 2014 -0400 fielddata accountable fixes commit `d43da26571` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 16:19:53 2014 +0100 cut over BloomFilterPostings to new API commit `29b192ba62` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 10:22:51 2014 -0400 fix more analyzers commit `74b4a0c528` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 09:54:25 2014 -0400 fix tests commit `554084ccb4` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 14:51:48 2014 +0100 maintain supressed exceptions on CorruptIndexException commit `cf882d9112` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 14:47:17 2014 +0100 commitOnClose=false commit `ebb2a9189a` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 14:46:06 2014 +0100 cut over indexwriter closeing in InternalEngine commit `cd21b3d470` Author: Simon Willnauer <simonw@apache.org> Date: Mon Oct 27 14:38:10 2014 +0100 fix constant commit `f93f900c4a` Author: Robert Muir <rmuir@apache.org> Date: Mon Oct 27 09:50:49 2014 -0400 fix test commit `a9a752940b` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Mon Oct 27 09:26:18 2014 +0100 Be explicit about the index options commit `d9ee815bab` Author: Simon Willnauer <simonw@apache.org> Date: Sun Oct 26 20:03:44 2014 +0100 cut over store and directory commit `b3f5c8e390` Author: Robert Muir <rmuir@apache.org> Date: Sun Oct 26 13:08:39 2014 -0400 more test fixes commit `8842f2684e` Author: Robert Muir <rmuir@apache.org> Date: Sun Oct 26 12:14:52 2014 -0400 tests manual labor commit `c43de5aec3` Author: Robert Muir <rmuir@apache.org> Date: Sun Oct 26 11:04:13 2014 -0400 BytesRef -> BytesRefBuilder commit `020c0d087a` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Sun Oct 26 15:53:37 2014 +0100 Moved over to BitSetFilter commit `48dd1b909e` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Sun Oct 26 15:53:11 2014 +0100 Left over Collector api change in ScanContext commit `6ec248ef63` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Sun Oct 26 15:47:40 2014 +0100 Moved indexed() over to indexOptions != null or indexOptions == null commit `9937aebfd8` Author: Martijn van Groningen <martijn.v.groningen@gmail.com> Date: Sun Oct 26 13:26:31 2014 +0100 Fixed many compile errors. Mainly around the breaking Collector api change in 5.0. commit `fec32c4abc` Author: Robert Muir <rmuir@apache.org> Date: Sat Oct 25 11:22:17 2014 -0400 more easy fixes commit `dab22531d8` Author: Robert Muir <rmuir@apache.org> Date: Sat Oct 25 09:33:41 2014 -0400 more progress commit `414767e9a9` Author: Robert Muir <rmuir@apache.org> Date: Sat Oct 25 06:33:17 2014 -0400 more progress commit `ad9d969fdd` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 24 14:28:01 2014 -0400 current state of fun commit `464475eecb` Author: Robert Muir <rmuir@apache.org> Date: Fri Oct 24 11:42:41 2014 -0400 bump to 5.0 snapshot	2014-11-05 15:48:51 -05:00
Aarni Koskela	6011a18381	Docs: Add mention of `hyphenation_patterns_path` Refs ElasticSearch's HyphenationCompoundWordTokenFilterFactory.java. Closes #8305	2014-11-01 15:47:53 +01:00
Jun Ohtani	533c1084ec	Docs: add the predefined language-specific stopword lists to stop-tokenfilter.asciidoc	2014-10-16 13:20:38 +09:00
sp836490	517caa0c6f	Update cjk-bigram-tokenfilter.asciidoc	2014-10-15 11:54:19 +09:00
HenrikOssipoff	1445dd2308	Remove comma in JSON Closes #7827	2014-09-28 11:08:09 +02:00
Clinton Gormley	cb00d4a542	Docs: Removed all the added/deprecated tags from 1.x	2014-09-26 21:04:42 +02:00
Clinton Gormley	091578d117	Update stemmer-tokenfilter.asciidoc Change the `minimal_english` link to a publicly accessible URL	2014-09-25 20:29:12 +02:00
Sergii Golubev	059d9f757a	Docs: bad text wrapping On the page http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html even on a huge monitor the text is being wrapped the next way ``` mapping: ipod, i-pod, i pod => ipod, i-pod, i pod mapping: ipod, i-pod, i pod => ipod ``` So one can think that "mapping:" is not in comment and is a part of syntax. But the lines are less than 80 chars, so perhaps the problem is in the page layout and there may be some other pages in the reference where the text is also being wrapped in an undesirable way. Closes #7739	2014-09-25 19:43:23 +02:00
Nik Everett	7bcd09a134	[docs] fix typo in language analyzer docs	2014-09-04 09:33:00 +02:00
Robert Muir	395744b0d2	[Analysis] Add missing docs for latvian analysis	2014-09-02 19:22:59 -04:00
Robert Muir	5c7cefa292	Analysis: Add keep_types for filtering by token type	2014-08-15 09:28:12 -04:00
Nik Everett	34426eb8c2	Docs: Fix syntax on lang-analyzer Some of the language analyzer documentation contained invalid json. Closes #7098	2014-07-30 20:17:27 +02:00
Simon Willnauer	5bfea56457	[DOCS] move all coming tags to added in master	2014-07-23 16:37:19 +02:00
Clinton Gormley	6e70edb0a4	Analysis: Improve Hunspell error messages The Hunspell service would throw a confusing error message if more than one affix file was present. This commit distinguishes between the two error cases: where there are no affix files and when there are too many affix files. Also implements lazy dictionary loading, which was used in the tests but not implemented. Closes #6850	2014-07-14 12:13:32 +02:00
Clinton Gormley	e4baa56f4b	Docs: Language analyzers Clarified the use of stem_exclusion and the keyword_marker token filter Closes #6613	2014-07-07 10:06:18 +02:00
Clinton Gormley	54790eea10	Update lang-analyzer.asciidoc Clarified the use of the `stem_exclusion` token filter. Closes #6613	2014-07-04 17:50:43 +02:00
Jun Ohtani	0c6a859357	Docs: fixed ICU plugin documentation add ICU Normalization CharFilter to docs Closes #6711	2014-07-03 15:21:51 +02:00
Mikhail Korobov	955473f475	Docs: unescape regexes in Pattern Tokenizer docs Currently regexes in Pattern Tokenizer docs are escaped (it seems according to Java rules). I think it is better not to escape them because JSON escaping should be automatic in client libraries, and string escaping depends on a client language used. The default pattern is `\W+`, not `\\W+`. Closes #6615	2014-07-03 13:34:13 +02:00
Robert Muir	2935b751e9	Fix doc formatting. Norwegian stemmers and Scandinavian normalizers were missing commas between entries.	2014-07-03 07:08:33 -04:00
Robert Muir	b9a09c2b06	Analysis: Add additional Analyzers, Tokenizers, and TokenFilters from Lucene Add `irish` analyzer Add `sorani` analyzer (Kurdish) Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc. Add `thai` tokenizer: segments thai text into words. Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself Add `german_normalization` tokenfilter: umlaut/sharp S normalization Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages Add `sorani_normalization` tokenfilter: normalizes kurdish text Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization` Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk` Add support access to default Thai stopword set "_thai_" Fix some bugs and broken links in documentation. Closes #5935	2014-07-03 05:47:49 -04:00
Clinton Gormley	cf059378d1	Docs: Updated stop token filter docs	2014-06-21 18:42:38 +02:00
Clinton Gormley	69350dc426	Update stemmer-override-tokenfilter.asciidoc	2014-06-18 11:34:20 +02:00
Clinton Gormley	f546662e8f	Docs: Hunspell tidied Tidied some formatting	2014-06-11 21:49:02 +02:00
Clinton Gormley	04dacaaf27	Docs: Use the "stemmer" token filter for the english analyzer, to be consistent	2014-06-11 13:47:07 +02:00
Clinton Gormley	8a94b71b75	Docs: Corrected the use of keyword_marker on the lang analyzers	2014-06-11 13:43:02 +02:00
Clinton Gormley	673ef3db3f	The StemmerTokenFilter had a number of issues: * `english` returned the slow snowball English stemmer * `porter2` returned the snowball Porter stemmer (v1) * `portuguese` was used twice, preventing the second version from working Changes: * `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards) * `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards) * `light_english` now returns the `kstem` stemmer (`kstem` still works) * `portuguese_rslp` returns the PortugueseStemmer * `dutch_kp` is a synonym for `kp` Tests and docs updated Fixes #6345 Fixes #6213 Fixes #6330	2014-06-11 12:30:16 +02:00
Clinton Gormley	e323e577e8	Docs: Fixed bad ref on cjk_width/bigram pages	2014-06-09 23:36:58 +02:00
Clinton Gormley	5e40868f44	Docs: Fixed a bad ref on lang analyzers page	2014-06-09 23:03:12 +02:00
Clinton Gormley	5c5c1da06c	Docs: Fixed some errors on the language analyzers page	2014-06-09 22:51:28 +02:00
Clinton Gormley	585b0ef730	Docs: Added custom-analyzer equivalents of all the language analyzers	2014-06-09 22:41:25 +02:00
Clinton Gormley	bc402d5f87	Docs: Documented the cjk_width and cjk_bigram token filters	2014-06-09 22:40:58 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Simon Willnauer	f79b28375d	Add missing coming tag Relates to #6188 Relates to #5539	2014-05-18 10:54:17 +02:00
Richard Boulton	fdb5eb6555	Update keyword-tokenizer.asciidoc	2014-05-07 15:04:07 +02:00
Matthieu Bacconnier	7fd5f18539	Update asciifolding-tokenfilter.asciidoc Typo	2014-05-06 16:30:09 +02:00
Ali Bozorgkhan	f1af845795	[DOCS] Fixed a typo Close #5963	2014-05-06 10:28:13 +02:00
Robert Muir	8e0a479316	Upgrade to Lucene 4.8 Closes #5932	2014-04-28 06:45:50 -04:00
Clinton Gormley	c1e03bf860	Update keyword-repeat-tokenfilter.asciidoc	2014-04-24 16:44:02 +02:00
Kevin Wang	374b633a4b	add uppercase token filter closes #5539	2014-03-26 15:07:43 +07:00
bleskes	5d832374dd	Update Documentation Feature Flags [1.1.0]	2014-03-25 17:51:30 +01:00
Clinton Gormley	4c34615686	[DOCS] Fixed some bad UTF8	2014-03-19 12:46:06 +01:00
Simon Willnauer	9160516b28	Expose `filler_token` via ShingleTokenFilterFactory Lucene 4.7 supports a setter for the `filler_token` that is inserted if there are gaps in the token stream. This change exposes this setting. Closes #4307	2014-02-26 22:21:10 +01:00
Nik Everett	5c3f4ceafb	Add preserve original token option to ASCIIFolding Closes #4931	2014-02-14 19:37:00 +01:00
Alexander Reelsen	c6155c5142	release [1.0.0.RC1]	2014-01-15 17:02:22 +00:00
Benjamin Vetter	ba8e012be9	Referring to stop analyzer for stopword docs #329	2014-01-14 11:53:30 +01:00
Benjamin Vetter	22a96e6a18	Added stopwords: _none_ to the docs #329	2014-01-14 11:53:29 +01:00
Simon Willnauer	7f63ddf94e	Default stopwords list should be `_none_` for all but language-specific analyzers `standard_html_strip` and `pattern` analyzer support stopwords which are set to the default `english` stopwords by default. Those analyzers should not use stopwords by default since they are language neutral Closes #4699	2014-01-13 14:44:10 +01:00
Yousef	302c762d5e	Wrong link to Token Filter	2013-12-03 10:39:13 +01:00
Lee Hinman	9939e81d88	[DOCS] Fix porter stem filter name in other stemming docs	2013-11-28 22:14:47 -07:00
Lee Hinman	fb4e903e35	[DOCS] Fix name of porter stemming token filter	2013-11-28 22:01:19 -07:00
Simon Willnauer	77bc5d5ecf	release [1.0.0.Beta1]	2013-11-06 15:32:43 +01:00
Simon Willnauer	9654631186	Change 'standart' analyzer to use emtpy stopword list by default. The 'default' / 'standard' analyzer can be a trappy default sicne it filters english stopwords by default. Yet a default should not be dedicated to a certain language since elasticsearch is used in many different scenarios where a standard analysis chain with specialization to english full-text might be rather counter productive. This commit changes the 'standard' analyzer to use an empty stopword list for indices that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy for older indices. Closes #3775	2013-11-05 21:07:21 +01:00
Boaz Leskes	a9fdcadf01	[DOCS] Added documentation for the keep word token filter	2013-11-04 18:38:44 +01:00
Clinton Gormley	4206cc988e	[DOCS] Typo on shingle tokenfilter	2013-10-31 20:18:00 +01:00
Ben McCann	cc4bc7d57d	Fix nonsensical sentence in standard analyzer documentation so that it is more understandable	2013-10-25 00:18:32 +02:00
Alexander Reelsen	4d19239ec4	Add support for Lucene SuggestStopFilter The suggest stop filter is an improved version of the stop filter, which takes stopwords only into account if the last char of a query is a whitespace. This allows you to keep stopwords, but to allow suggesting for "a". Example: Index document content "a word". You are now able to suggest for "a" and get back results in the completion suggester, if the suggest stop filter is used on the query side, but will not get back any results for "a " as this is identified as a stopword. The implementation allows to set the `remove_trailing` parameter for a custom stop filter and thus use the suggest stop filter instead of the standard stop filter.	2013-10-15 16:12:02 +02:00
Britta Weber	c3ab79a10e	[DOCS] Add doc for delimited payload token filter	2013-10-14 13:41:35 +02:00
Clinton Gormley	d062409309	[DOCS] Removed enable_position_increments in stop filter	2013-10-05 17:06:13 +02:00
Clinton Gormley	ea05f4538c	[DOCS] Updated ICU-Plugin docs from the repo README	2013-10-05 16:31:52 +02:00
Lee Hinman	ba40aa374e	Uniquify anchor links to fix asciidoc/docbook generation	2013-09-30 15:32:00 -06:00
Lee Hinman	0442b737be	Add more anchor links to documentation Related to #3679	2013-09-30 13:13:16 -06:00
Adrien Grand	90524d7ad2	Fix formatting of the documentation. Remaining '@'s have been replaced with '`'s.	2013-09-18 12:35:44 +02:00
Clinton Gormley	393c28bee4	[DOCS] Removed outdated new/deprecated version notices	2013-09-03 21:28:31 +02:00
Boaz Leskes	e807c99f27	Fixed a typo in the config of light finnish stemmer (old last_finish is still supported for backward compatibility) Closes #3594	2013-08-29 10:15:40 +02:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

... 2 3 4 5 6 ...

336 Commits