Commit Graph

364 Commits

Author SHA1 Message Date
Jordan Powers a6c4c08527
Fix index version check in match_only_text (#130415)
Accidentally used the wrong backport index version in #130363.
2025-07-03 04:18:07 +10:00
Jordan Powers a69c48477f
Add index version for match_only_text stored field in binary format (#130363)
Follow-up to #130049 to gate using the binary format for the stored field
in match_only_text fields behind an index version.
2025-07-01 18:19:18 -07:00
Martijn van Groningen 15c0028c04
Fix match_only_text bugs if defined as multi-field (#130188)
* Fix match_only_text bugs if defined as multi-field

Bugs starting to occur when #129126 was merged.

Closes #129737
2025-06-30 17:35:54 +10:00
Jordan Powers 40a7d02269
Pull match_only_text fixes into main (#130049)
This brings in the fixes from #130020, with minor fixes to address review
nits from that PR.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2025-06-27 04:31:33 +10:00
Martijn van Groningen c0b2bb1f75
Stop configuring `index.number_of_replicas` in mapper-extras yaml tests. (#130099) 2025-06-26 17:59:03 +02:00
Parker Timmins 9aaba25d58
Simple version of patterned_text with a single doc value for arguments (#129292)
Initial version of patterned_text mapper. Behaves similarly to match_only_text. This version uses a single SortedSetDocValues for a template and another for arguments. It splits the message by delimiters, the classifies a token as an argument if it contains a digit. All arguments are concatenated and inserted as a single doc value. A single inverted index is used, without positions. Phrase queries are still possible, using the SourceConfirmedTextQuery, but are not fast.
2025-06-25 21:31:32 -05:00
Jordan Powers 5d1999781a
Use optimized text in match_only_text fields (#129371)
Follow-up to #126492 to use the json parsing optimizations for
match_only_text fields.

Relates to #129072.
2025-06-17 08:15:40 -07:00
Martijn van Groningen a0cc698fa2
Update multi field stored by default index version check (#129386)
Relates to #129126
2025-06-17 12:20:38 +02:00
Simon Cooper 3988ee1935
Check positions on MultiPhraseQueries as well as phrase queries (#129326) 2025-06-12 16:05:07 +01:00
Ignacio Vera f02a3c423f
Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (#128293)" (#129206)
This reverts commit de7c91c1d9.
2025-06-12 10:10:29 +02:00
Martijn van Groningen 33af83a0ca
Synthetic source: avoid storing multi fields of type text and match_only_text by default. (#129126)
Don't store text and match_only_text field by default when source mode is synthetic and a field is a multi field or when there is a suitable multi field.

Without this change, ES would store field otherwise twice in a multi-field configuration.

For example:

```
...
"os": {
  "properties": {
    "name": {
      "ignore_above": 1024,
      "type": "keyword",
      "fields": {
        "text": {
          "type": "match_only_text"
        }
      }
    }
...
```

In this case, two stored fields were added, one in case for the `name` field and one for `name.text` multi-field.
This change prevents this, and would never store a stored field when text or match_only_text field is a multi-field.
2025-06-10 16:32:47 +02:00
Benjamin Trent 2a44166a2c
Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732 (#128671)
* Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732

* fixing test

* fixing annot
2025-06-02 09:50:25 -04:00
Ignacio Vera de7c91c1d9
Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (#128293) 2025-05-23 16:58:50 +02:00
Oleksandr Kolomiiets 0c1b3acee2
Properly handle multi fields in block loaders with synthetic source enabled (#127483) 2025-04-30 09:33:35 -07:00
Benjamin Trent 3d67e0e7ca
Fix npe when using source confirmed text query against missing field (#127414)
docs-build / docs-preview (push) Waiting to run Details
Validate Gradle Wrapper / Validation (push) Waiting to run Details
We should check for the field and statistics actually existing when
checking matches and explanation with `match_only_text` fields

closes: https://github.com/elastic/elasticsearch/issues/125635
2025-04-30 03:05:01 +10:00
Oleksandr Kolomiiets 26e2261132
Remove legacy block loader test infrastructure (#127273) 2025-04-25 10:26:27 -07:00
Oleksandr Kolomiiets 5e2b199b94
[TEST] Move test data generation out of logsdb namespace (#119994) 2025-04-23 08:29:32 -07:00
Jordan Powers 71e74bdd66
Store arrays offsets for scaled float fields natively with synthetic source (#125793)
This patch builds on the work in #113757, #122999, #124594, #125529, and 
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
2025-03-28 20:26:29 +01:00
Oleksandr Kolomiiets 033d28e792
Use FallbackSyntheticSourceBlockLoader for shape and geo_shape (#124927) 2025-03-18 08:49:08 -07:00
Nik Everett 50aaa1c2a6
ESQL: Pragma to load from stored fields (#122891)
This creates a `pragma` you can use to request that fields load from a
stored field rather than doc values. It implements that pragma for
`keyword` and number fields.

We expect that, for some disk configuration and some number of fields,
that it's faster to load those fields from _source or stored fields than
it is to use doc values. Our default is doc values and on my laptop it's
*always* faster to use doc values. But we don't ship my laptop to every
cluster.

This will let us experiment and debug slow queries by trying to load
fields a different way.

You access this pragma with:
```
curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
    "query": "FROM foo",
    "pragma": {
        "field_extract_preference": "STORED"
    }
}'
```

On a release build you'll need to add `"accept_pragma_risks": true`.
2025-03-12 09:40:42 -04:00
Oleksandr Kolomiiets 99262c6256
Use FallbackSyntheticSourceBlockLoader for boolean and date fields (#124050) 2025-03-05 11:43:47 -08:00
Gal Lalouche a6e47ae85b
Refactor FieldCapabilities creation by adding a proper builder object (#121310)
Reduce boilerplate associated with creating `FieldCapabilities` instances.
Since it's a class with a huge number of fields, it makes sense to define a builder object, as that can also help with all the Boolean and null blindness going on.
Note while there is a static Builder class in `FieldCapabilities`, it is not a proper builder object (no setters, still need to pass a lot of otherwise default parameters) and also package-private. To avoid changing that, I defined a new `FieldCapabilitiesBuilder` class. I also went over the code and refactored places which used the old constructor.
2025-03-05 13:09:36 +01:00
Martijn van Groningen 086329c5cb
Tidy up some noise during indexing with synthetic source. (#123724) 2025-02-28 16:52:17 +00:00
kanoshiou 7326928502
Fix failed ScaledFloatFieldMapperTests (#123144) 2025-02-21 11:34:46 -08:00
kanoshiou de41d5704b
ESQL: Fix precision of `scaled_float` field values retrieved from stored source (#122586) 2025-02-20 14:01:34 -08:00
Oleksandr Kolomiiets ba8c5764f8
Use FallbackSyntheticSourceBlockLoader for unsigned_long and scaled_float fields (#122637) 2025-02-18 09:28:26 -08:00
Oleksandr Kolomiiets b8d7e99cb9
Use FallbackSyntheticSourceBlockLoader for number fields (#122280) 2025-02-12 16:12:19 -08:00
Chris Hegarty 4baffe4de1
Upgrade to Lucene 10.1.0 (#119308)
This commit upgrades to Lucene 10.1.0.
2025-01-30 13:41:02 +00:00
Kostas Krikellas 8de9539e29
Lazy initialization for `SyntheticSourceSupport.loader()` (#120896)
* Lazy initialization for `SyntheticSourceSupport.loader()`

* [CI] Auto commit changes from spotless

* add missing

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-01-27 17:12:42 +02:00
Rene Groeschke ba61f8c7f7
Update Gradle wrapper to 8.12 (#118683)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
2024-12-30 15:34:24 +01:00
Armin Braun e94f145350
Fix a bunch of non-final static fields (#119185)
Fixing almost all missing `final` spots, who knows maybe we get a small speedup from
some constant folding here and there.
2024-12-26 19:14:36 +01:00
Dimitris Rempapis a514aad3c2
Fix/meta fields bad request (#117229)
400 rather a 5xx error is returned when _source / _seq_no / _feature / _nested_path / _field_names is requested, via fields
2024-12-03 10:58:20 +02:00
Oleksandr Kolomiiets 54db947020
Fix scaled_float test (#117662) 2024-11-28 07:33:35 -08:00
Oleksandr Kolomiiets 2b8e4e727c
Migrate mapper-related modules to internal-*-rest-test (#117298) 2024-11-23 00:35:24 +00:00
Rene Groeschke f6ac6e1c3b
[Build] Remove deprecated BuildParams (#116984) 2024-11-22 16:30:57 +01:00
Rene Groeschke 13c8aaeffa
[Gradle] Remove static use of BuildParams (#115122)
Static fields dont do well in Gradle with configuration cache enabled.

- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
-  Tweak testing doc
2024-11-15 17:58:57 +01:00
Kostas Krikellas 4573ab8ec1
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072)
* Reapply "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)

This reverts commit e8bf344a28.

* [TEST] Replace _source.mode with index.mapping.source.mode in integration tests

* add reason

* add reason

* spotless

* revert unneeded
2024-11-04 09:39:34 +02:00
Kostas Krikellas e8bf344a28
Revert "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)
This reverts commit a360757968.
2024-11-01 10:53:08 +02:00
Kostas Krikellas a360757968
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests (#115926)
* Replace _source.mode with index.mapping.source.mode in integration tests

* fix tests

* revert 40_source_mode_setting.yml
2024-11-01 09:46:06 +02:00
Nhat Nguyen f3b34f3e34
Remove old synthetic source mapping config (#115889)
This change replaces the old synthetic source config in mappings with 
the newly introduced index setting.

Closes #115859
2024-10-30 09:15:16 -07:00
Martijn van Groningen 387062eb80
Sometimes delegate to SourceLoader in ValueSourceReaderOperator for required stored fields (#115114)
If source is required by a block loader then the StoredFieldsSpec that gets populated should be enhanced by SourceLoader#requiredStoredFields(...) in ValuesSourceReaderOperator. Otherwise in case of synthetic source many stored fields aren't loaded, which causes only a subset of _source to be synthesized. For example when unmapped fields exist or field values that exceed configured ignore above will not appear is _source.

This happens when field types fallback to a block loader implementation that uses _source. The required field values are then extracted from the source once loaded.

This change also reverts the production code changes introduced via #114903. That change only ensured that _ignored_source field was added to the required list of stored fields. In reality more fields could be required. This change is better fix, since it handles also other cases and the SourceLoader implementation indicates which stored fields are needed.

Closes #115076
2024-10-23 10:20:42 +02:00
Luca Cavanna 8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
Martijn van Groningen c62a96c8ab
Include ignored source as part of loading field values in ValueSourceReaderOperator via BlockSourceReader. (#114903)
Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
2024-10-18 07:49:00 +02:00
Oleksandr Kolomiiets 2c10a18774
Fix block loader tests for token_count (#113718) 2024-10-01 10:25:26 -07:00
Chris Hegarty 32dde26e49
Upgrade to Lucene 9.12.0 (#113333)
This commit upgrades to Lucene 9.12.0.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Chris Hegarty <chegar999@gmail.com>
Co-authored-by: John Wagster <john.wagster@elastic.co>
Co-authored-by: Luca Cavanna <javanna@apache.org>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-10-01 08:39:27 +01:00
Mark Vieira a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Kostas Krikellas 86a88d735f
Fix synthetic source field names for multi-fields (#112850)
* Fix synthetic source field names for multi-fields

* enable logsdb in randomized tests

* Revert "enable logsdb in randomized tests"

This reverts commit 2e2c22e2bb.

* Update docs/changelog/112850.yaml

* fix
2024-09-13 15:00:55 +03:00
Oleksandr Kolomiiets 082e7211b3
Use fallback synthetic source for copy_to and doc_values: false cases (#112294) 2024-09-10 12:12:51 -07:00
Kostas Krikellas f3bc281978
Refactor build params for FieldMapper, adding SourceKeepMode (#112455)
* Refactor build params for FieldMapper

* more mappers and tests

* more mappers

* more mappers

* spotless

* spotless

* stored by default

* Revert "stored by default"

This reverts commit bbd247d64b.

* restore storeIgnored

* sync

* list valid values for SourceKeepMode

* small refactoring

* spotless
2024-09-06 14:16:17 +03:00
Oleksandr Kolomiiets 38adbb0724
Prevent synthetic field loaders accessing stored fields from using stale data (#112173) 2024-08-27 14:55:00 -07:00