Commit Graph

281 Commits

Author SHA1 Message Date
Lorenzo Dematté 77dac65761
Fix NodeInfo version parsing in bwc tests (#100838)
* Mixed cluster tests with string NodeInfo version

- Move version based feature comparison to a common, deprecated method (to be replaced with real features)
- Use string comparison against old cluster version to partition new/old cluster nodes
2023-10-25 12:59:10 +02:00
Lorenzo Dematté f878a8c308
Fix NodeInfo version parsing in integration tests (#100770)
* Compatible version parsing in YAML tests
* Compatible version parsing in various IT tests
2023-10-25 10:55:15 +02:00
Simon Cooper 5f43cd8f46
Retry rolling upgrade junit tests (#99760)
Re-applies the changes from #99572 to move some bwc tests to a junit-based build infrastructure. Some tests that did not handle the move well have been kept in rolling-upgrade-legacy using the old gradle-based infrastructure
2023-09-22 15:52:59 +01:00
Simon Cooper 06f09d861d
Revert "Migrate rolling upgrade tests to new junit format" (#99750)
Reverts elastic/elasticsearch#99572 and #99733

The new tests are unstable, and don't work on CI. This re-opens
https://github.com/elastic/elasticsearch/issues/97200
2023-09-21 09:42:04 -04:00
Simon Cooper 1b8df61bd6
Limit test parallelism to 1 for junit bwc tests (#99733)
gradle runs test tasks in parallel, this results in multiple test clusters being created, which breaks CI.
2023-09-21 11:16:37 +01:00
Simon Cooper aae2535235
Migrate rolling upgrade tests to new junit format (#99572)
Two test suites did not react well to the junit-based bwc infrastructure, so those have been separated into a legacy module using the old gradle-based system until they can be looked at properly.
This unblocks the 8.11 release.
2023-09-20 16:55:34 +01:00
Martijn van Groningen 0eb2181bb1
Don't ignore empty index template that have no template definition. (#98840)
A composable index template with no template defined in the body is mistakingly always assumed to not be a time series template. Even if it refers to a component template that has the index.mode setting set to time_series and the component template defines mappings with dimension fields or routing paths.

Closes #98834
2023-09-06 07:47:01 +02:00
Simon Cooper bebe2538b1
Bump to first non-release IndexVersion (#98478)
This bumps to an IndexVersion that is not associated with any specific release version. From this point, index metadata/data versioning will be handled in the same way as TransportVersion - a new constant for every change
2023-08-31 15:10:40 +01:00
Benjamin Trent 4b5585e428
Fix tests after indexing dense vectors by default (#98946)
By default in 8.11, vectors will be indexed by default. However, various
tests that relied on the previous behavior were not updated.

This PR updates those tests.

Related: https://github.com/elastic/elasticsearch/pull/98268
2023-08-29 11:06:02 -04:00
Simon Cooper b67a9e1ec3
Move text references to index created version to IndexVersion (#98727) 2023-08-23 10:51:56 +01:00
Simon Cooper 0754bd8943
Unmute FeatureUpgradeIT test (#98576)
Test was fixed by #98574
2023-08-17 13:35:52 +01:00
Tim Vernum 0c35785d22
Mute FeatureUpgradeIT (#98563)
Relates: #98562
2023-08-17 03:16:37 -04:00
David Turner 4458517a28
Fix testClosedIndexNoopRecovery (#98093)
This test only delays allocation for 120s which isn't necessarily enough
time for the node to come back, and yet it requires the shard to be
assigned back to the first node. This commit extends the timeout to make
sure the shard is reassigned as required, and tidies up a few other
aspects.

Closes #97896
2023-08-01 13:51:57 +01:00
Ievgen Degtiarenko 32fb774976
Enable testSnapshotBasedRecovery (#97654)
Cancel shard allocation command is broken for initial desired balance versions
and might allocate shard on the node where it is not supposed to be. This
is fixed by https://github.com/elastic/elasticsearch/pull/93635. Disabling test
when upgrading from affected versions.
2023-07-13 16:27:58 +02:00
Simon Cooper 602ccd8d60
Migrate IndexMetadata.getCreationVersion to IndexVersion (#97139) 2023-06-29 08:38:50 +01:00
Armin Braun c41bda9e3a
Dry up remaining verbose index setting building in tests (#95652)
Lasts spots I could easily find via regex.
Follow-up to #95569
2023-04-28 11:18:07 +02:00
Slobodan Adamović ee723ada0a
Mute SnapshotBasedRecoveryIT testSnapshotBasedRecovery (#94584)
Relates to https://github.com/elastic/elasticsearch/issues/93271
2023-03-21 08:45:27 -04:00
William Brafford 8b6ce3465e
Reset feature states before deleting indices in EsRestTestCase (#94191)
* Call feature cleanup in ES Test Case
* Don't reset features in certain integration tests
* Cleanup and handle warnings
2023-03-07 10:53:48 -05:00
Simon Cooper 8666136894
Remove some references to Version.transportVersion (#93861) 2023-02-20 11:55:30 +00:00
William Brafford b89fab2f01
Stop suggesting feature migrations for 7.x indices (#93666)
* Don't report MIGRATION_NEEDED for 7.x indices

Eventually, we will need to migrate 7.x indices to 8.x before doing a
significant upgrade of Lucene. However, the migrations to 8.x are not
adequately tested: while they will eventually be needed, they are not
currently needed, and may in fact produce bugs.

This change will ensure that the GET _migration/system_feature API
returns NO_MIGRATION_NEEDED in 8.x. We will begin to require
migrations once we start testing for the next major version upgrade.
2023-02-10 11:58:54 -05:00
David Turner f068cfda18
Only simulate legal desired moves (#93635)
Today when setting up for the desired balance computation we move all
shards to their desired locations without checking any allocation rules.
However, certain allocation rules (e.g. those related to node versions
and shutdowns) may prevent these movements in reality, resulting in a
shard which cannot move to its desired location but which may not remain
on its current node either.

This commit adds some checks to verify that these preliminary moves are
still legal when setting up the computation.

Closes #93271
2023-02-09 08:44:30 -05:00
David Turner 438f2f8daf Mute flaky testSnapshotBasedRecovery
Relates #93271
2023-02-06 17:48:22 +00:00
David Turner afc24e1326
More logging for testSnapshotBasedRecovery (#93513)
Relates #93271
2023-02-06 07:42:45 -05:00
David Turner eb9eeae5ed
More investigation into 93271 (#93454)
We still don't properly understand why this test is failing, and it
doesn't reproduce locally, so this commit adds a little extra logging to
capture extra detail from a failure in CI.
2023-02-02 07:03:51 -05:00
Simon Cooper f086dd18dd
Migrate misc packages to TransportVersion (#93272) 2023-01-31 11:24:32 +00:00
Ievgen Degtiarenko 7b9df003b9
Disable unnecessary logs collection (#93339) 2023-01-30 13:16:19 +01:00
Ievgen Degtiarenko 9cce7f12eb
Mute flaky testSnapshotBasedRecovery (#93296) 2023-01-27 12:57:59 +01:00
Ievgen Degtiarenko 35e0c4d989
Additional debug logging (#93277) 2023-01-26 18:51:46 +01:00
Ievgen Degtiarenko ce7ed7bd9d
Do not emit deprecation warning in the shared context (#93183) 2023-01-26 11:42:36 +01:00
Ievgen Degtiarenko 0ddbd25100
Prevent deprecation field usage in SnapshotBasedRecoveryIT (#93182) 2023-01-25 13:31:48 +01:00
Benjamin Trent 2f14b549e5
Adding rolling upgrade KNN search tests (#93043) 2023-01-23 11:41:46 -05:00
Artem Prigoda 2bc7398754
Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106)
Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. 
Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake.
Add `Strings.format` alias in `common.Strings`
2023-01-03 19:28:27 +01:00
Albert Zaharovits 9e045401df
mute snapshot based recovery (#91847) 2022-11-24 16:42:53 +01:00
Francisco Fernández Castaño 1203750520
Add more debug info to SnapshotBasedRecoveryIT (#91745)
Relates #91383
2022-11-21 13:21:01 +01:00
Francisco Fernández Castaño 87ffc24cf1
Unmute SnapshotBasedRecoveryIT#testSnapshotBasedRecovery (#91719) 2022-11-18 12:14:26 -05:00
Francisco Fernández Castaño fcefcb6b0e
Mute SnapshotBasedRecoveryIT#testSnapshotBasedRecovery (#91687) 2022-11-18 05:50:22 -05:00
Nikola Grcevski 0652a59155
Allow legacy index settings on legacy indices (#90264)
Add code to ignore unknown index settings on
old indices, to allow rolling-upgrades to work.

Co-authored-by: David Turner <david.turner@elastic.co>
2022-11-02 13:58:44 -04:00
Rene Groeschke 43a0377735
Update forbiddenapis to 3.4 (#90624)
Fix breaking changes to source validation after change in default jdk rule set
2022-10-06 16:52:06 +02:00
Francisco Fernández Castaño eaf188b71f
Fix DesiredNodesUpgradeIT upgrade from 8.1 (#89907)
Doubles get casted to int during parsing in 8.1/8.2. Use 0.5f as we used before.

Closes #89877
Closes #90004
2022-09-13 14:18:50 +02:00
David Kyle 19f804727b
Mute DesiredNodesUpgradeIT (#90005)
For #90004
2022-09-12 23:23:11 +09:30
Francisco Fernández Castaño 284dce6a2a
Centralize the concept of processors configuration (#89662)
This commit centralize the processor count concept into the Processors class.
With this change now all the places using a processor count rely on this new class,
such as desired nodes, `node.processors` setting and autoscaling deciders.

- Processor counts are rounded to up to 5 decimal places
- Processors can be represented as doubles

Desired nodes processors were stored as floats, this poses some challenges during
upgrades as once the value is casted to a double, the precision increases and therefore
the number is not the same. In order to allow idempotent desired nodes updates after
upgrades, this commit introduces `DesiredNode#equalsWithProcessorsCloseTo(DesiredNode that)`
which allows comparing two desired nodes that differ up to a max delta in their processor
specification as floats.
2022-09-06 18:17:36 +02:00
Nikola Grcevski 5af8ec52fe
Support camel case dates on 7.x indices (#88914)
This adds back compatibility support for camel case dates
for 7.x indices used in 8.x.
2022-08-16 15:57:59 -04:00
Alan Woodward 5c11a81913
Add 'mode' option to `_source` field mapper (#88211)
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
2022-07-18 12:50:10 +01:00
Francisco Fernández Castaño eb8c4ba97b
Keep track of desired nodes status in cluster state (#87474)
This commit adds desired nodes status tracking to the cluster state. Previously status was tracked
in-memory by DesiredNodesMembershipService this approach had certain limitations, and made
the consumer code more complex. This takes a simpler approach to keep the status updated when
the desired nodes are updated or when a new node joins, storing the status in the cluster state,
this allows to consume that information easily where it is necessary.
Additionally, this commit moves test code from depending directly of DesiredNodes which can be
seen as an internal data structure to rely more on UpdateDesiredNodesRequest.

Relates #84165
2022-06-16 11:08:05 +02:00
Francisco Fernández Castaño e91e7e653b
Add support for CPU ranges in desired nodes (#86434)
This commit adds support for CPU ranges in the desired nodes API. 

This aligns better with environments where administrators/orchestrators
can define lower and upper bounds for the amount of CPUs that the
desired node would get once deployed. 

This allows to provide information about the expected CPU and possible
allowed overcommit that the desired node will run on.

This was the previous expected body for the desired nodes API (we still support it):
```
PUT /_internal/desired_nodes/history/1
{
    "nodes" : [
        {
            "settings" : {
                 "node.name" : "instance-000187",
                 "node.external_id": "instance-000187",
                 "node.roles" : ["data_hot", "master"],
                 "node.attr.data" : "hot",
                 "node.attr.logical_availability_zone" : "zone-0"
            },
            "processors" : 8, 
            "memory" : "58gb",
            "storage" : "1700gb",
            "node_version" : "8.3.0"
        }
    ]
}
```

Now it's possible to define `processors` or `processors_range` as in:
```
PUT /_internal/desired_nodes/history/1
{
    "nodes" : [
        {
            "settings" : {
                 "node.name" : "instance-000187",
                 "node.external_id": "instance-000187",
                 "node.roles" : ["data_hot", "master"],
                 "node.attr.data" : "hot",
                 "node.attr.logical_availability_zone" : "zone-0"
            },
            "processors_range" : {"min": 8.0, "max": 16.0},
            "memory" : "58gb",
            "storage" : "1700gb",
            "node_version" : "8.3.0"
        }
    ]
}
```
Note that `max` in `processors_range` is optional.

This commit also moves from representing CPUs as integers to
accept floating point numbers.

Note: I disabled the bwc yamlRestTests for versions < 8.3 since we introduced
a few "breaking changes" but since this is an internal API it should be fine.
2022-05-20 11:47:32 +02:00
Nik Everett a589456b81
Synthetic source (#85649)
This attempts to shrink the index by implementing a "synthetic _source" field.
You configure it by in the mapping:
```
{
  "mappings": {
    "_source": {
      "synthetic": true
    }
  }
}
```

And we just stop storing the `_source` field - kind of. When you go to access
the `_source` we regenerate it on the fly by loading doc values. Doc values
don't preserve the original structure of the source you sent so we have to
make some educated guesses. And we have a rule: the source we generate would
result in the same index if you sent it back to us. That way you can use it
for things like `_reindex`.

Fetching the `_source` from doc values does slow down loading somewhat. See
numbers further down.

## Supported fields
This only works for the following fields:
* `boolean`
* `byte`
* `date`
* `double`
* `float`
* `geo_point` (with precision loss)
* `half_float`
* `integer`
* `ip`
* `keyword`
* `long`
* `scaled_float`
* `short`
* `text` (when there is a `keyword` sub-field that is compatible with this feature)


## Educated guesses

The synthetic source generator makes `_source` fields that are:
* sorted alphabetically
* as "objecty" as possible
* pushes all arrays to the "leaf" fields
* sorts most array values
* removes duplicate text and keyword values

These are mostly artifacts of how doc values are stored.

### sorted alphabetically
```
{
  "b": 1,
  "c": 2,
  "a": 3
}
```
becomes
```
{
  "a": 3,
  "b": 1,
  "c": 2
}
```

### as "objecty" as possible
```
{
  "a.b": "foo"
}
```
becomes
```
{
  "a": {
    "b": "foo"
  }
}
```

### pushes all arrays to the "leaf" fields
```
{
  "a": [
    {
      "b": "foo",
      "c": "bar"
    },
    {
      "c": "bort"
    },
    {
      "b": "snort"
    }
}
```
becomes
```
{
  "a" {
    "b": ["foo", "snort"],
    "c": ["bar", "bort"]
  }
}
```

### sorts most array values
```
{
  "a": [2, 3, 1]
}
```
becomes
```
{
  "a": [1, 2, 3]
}
```

### removes duplicate text and keyword values
```
{
  "a": ["bar", "baz", "baz", "baz", "foo", "foo"]
}
```
becomes
```
{
  "a": ["bar", "baz", "foo"]
}
```
## `_recovery_source`

Elasticsearch's shard "recovery" process needs `_source` *sometimes*. So does
cross cluster replication. If you disable source or filter it somehow we store
a `_recovery_source` field for as long as the recovery process might need it.
When everything is running smoothly that's generally a few seconds or minutes.
Then the fields is removed on merge. This synthetic source feature continues
to produce `_recovery_source` and relies on it for recovery. It's *possible*
to synthesize `_source` during recovery but we don't do it.

That means that synethic source doesn't speed up writing the index. But in the
future we might be able to turn this on to trade writing less data at index
time for slower recovery and cross cluster replication. That's an area of
future improvement.

## perf numbers

I loaded the entire tsdb data set with this change and the size:

```
           standard -> synthetic
store size  31.0 GB ->  7.0 GB  (77.5% reduction)
_source  24695.7 MB -> 47.6 MB  (99.8% reduction - synthetic is in _recovery_source)
```

A second _forcemerge a few minutes after rally finishes should removes the
remaining 47.6MB of _recovery_source.

With this fetching source for 1,000 documents seems to take about 500ms. I
spot checked a lot of different areas and haven't seen any different hit. I
*expect* this performance impact is based on the number of doc values fields
in the index and how sparse they are.
2022-05-10 07:46:58 -04:00
Alan Woodward a5452603cc
Extra testing and some cleanups for filtering on field caps (#85068)
* adds a test for mixed cluster requests
* fixes a bad stream version check (above test will fail if this isn't included)
* replaces private FieldCapsFilter interface with Predicate
* renames 'allowedTypes' to 'types' to maintain consistency with external API
* adds javadoc to ResponseRewriter
* removes isRuntimeField from FieldTypeLookup

Relates to #83636
2022-03-29 11:38:52 +01:00
Nhat Nguyen 273eeddc14
Add BWC test for field-caps (#84455)
Relates #83494
2022-03-15 10:06:16 -04:00
Ryan Ernst 0ec229050e
Move yaml rest test case to separate test lib (#84835)
The ESClientYamlSuiteTestCase is used to run yaml tests throughout
Elasticsearch. It utilizes the low level rest client in sniffing for
nodes, but the sniffer is not needed anywhere else in the test
framework.

This commit creates a new project, `:test:rest-runner` which is meant to
house the rest test running infrastructure. This has two purposes. First
is to remove the sniffer from the test framework dependencies, because
it transitively depends on Jackson. Second is to setup the runner for
future refactorings where it could be made to not depend on the entire
test framework, though how that could work is left for the future.
2022-03-11 10:51:11 -05:00
Nik Everett 37ea6a8255
TSDB: Support GET and DELETE and doc versioning (#82633)
This adds support for GET and DELETE and the ids query and
Elasticsearch's standard document versioning to TSDB. So you can do
things like:
```
POST /tsdb_idx/_doc?filter_path=_id
{
  "@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2
}
```

That'll return `{"_id" : "BsYQJjqS3TnsUlF3aDKnB34BAAA"}` which you can turn
around and fetch with
```
GET /tsdb_idx/_doc/BsYQJjqS3TnsUlF3aDKnB34BAAA
```
just like any other document in any other index. You can delete it too!
Or fetch it.

The ID comes from the dimensions and the `@timestamp`. So you can
overwrite the document:
```
POST /tsdb_idx/_bulk
{"index": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```

Or you can write only if it doesn't already exist:
```
POST /tsdb_idx/_bulk
{"create": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```

This works by generating an id from the dimensions and the `@timestamp`
when parsing the document. The id looks like:
* 4 bytes of hash from the routing calculated from routing_path fields
* 8 bytes of hash from the dimensions
* 8 bytes of timestamp
All that's base 64 encoded so that `Uid` can chew on it fairly
efficiently.

When it comes time to fetch or delete documents we base 64 decode the id
and grab the routing from the first four bytes. We use that hash to pick
the shard. Then we use the entire ID to perform the fetch or delete.

We don't implement update actions because we haven't written the
infrastructure to make sure the dimensions don't change. It's possible
to do, but feels like more than we need now.

There *ton* of compromises with this. The long term sad thing is that it
locks us into *indexing* the id of the sample. It'll index fairly
efficiently because the each time series will have the same first eight
bytes. It's also possible we'd share many of the first few bytes in the
timestamp as well. In our tsdb rally track this costs 8.75 bytes per
document. It's substantial, but not overwhelming.

In the short term there are lots of problems that I'd like to save for a
follow up change:
1. ~~We still generate the automatic `_id` for the document but we don't use
   it. We should stop generating it.~~ Included in this PR based on review comments.
2. We generated the time series `_id` on each shard and when replaying
   the translog. It'd be the good kind of paranoid to generate it once
   on the primary and then keep it forever.
3. We have to encode the `_id` as a string to pass it around
   Elasticsearch internally. And Elasticsearch assumes that when an id
   is loaded we always store as bytes encoded the `Uid` - which *does*
   have nice encoding for base 64 bytes. But this whole thing requires
   us to make the bytes, base 64 encode them, and then hand them back to
   `Uid` to base 64 decode them into bytes. It's a bit hacky. And, it's
   a small thing, but if the first byte of the routing hash encodes to
   254 or 255 we `Uid` spends an extra byte to encode it. One that'll
   always be a common prefix for tsdb indices, but still, it hurts my
   heart. It's just hard to fix.
4. We store the `_id` in Lucene stored fields for tsdb indices. Now
   that we're building it from the dimensions and the `@timestamp` we
   really don't *need* to store it. We could recalculate it when fetching
   documents. In the tsdb rall ytrick this'd save us 6 bytes per document
   at the cost of marginally slower fetches. Which is *fine*.
5. There are several error messages that try to use `_id` right now
   during parsing but the `_id` isn't available until after the parsing
   is complete. And, if parsing fails, it may not be possible to know
   the id at all. All of these error messages will have to change,
   at least in tsdb mode.
6. ~~If you specify an `_id` on the request right now we just overwrite
   it. We should send you an error.~~ Included in this PR after review comments.
7. We have to entirely disable the append-only optimization that allows
   Elasticsearch to skip looking up the ids in lucene. This *halves*
   indexing speed. It's substantial. We have to claw that optimization
   back *somehow*. Something like sliding bloom filters or relying on
   the increasing timestamps.
8. We parse the source from json when building the routing hash when
   parsing fields. We should just build it from to parsed field values.
   It looks like that'd improve indexing speed by about 20%.
9. Right now we write the `@timestamp` little endian. This is likely bad
   the prefix encoded inverted index. It'll prefer big endian. Might shrink it.
10. Improve error message on version conflict to include tsid and timestamp.
11. Improve error message when modifying dimensions or timestamp in update_by_query
12. Make it possible to modify dimension or timestamp in reindex.
13. Test TSDB's `_id` in `RecoverySourceHandlerTests.java` and `EngineTests.java`.

I've had to make some changes as part of this that don't feel super
expected. The biggest one is changing `Engine.Result` to include the
`id`. When the `id` comes from the dimensions it is calculated by the
document parsing infrastructure which is happens in
`IndexShard#pepareIndex`. Which returns an `Engine.IndexResult`. To make
everything clean I made it so `id` is available on all `Engine.Result`s
and I made all of the "outer results classes" read from
`Engine.Results#id`. I'm not excited by it. But it works and it's what
we're going with.

I've opted to create two subclasses of `IdFieldMapper`, one for standard
indices and one for tsdb indices. This feels like the right way to
introduce the distinction, especially if we don't want tsdb to cary
around it's old fielddata support. Honestly if we *need* to aggregate on
`_id` in tsdb mode we have doc values for the `tsdb` and the
`@timestamp` - we could build doc values for `_id` on the fly. But I'm
not expecting folks will need to do this. Also! I'd like to stop storing
tsdb'd `_id` field (see number 4 above) and the new subclass feels like
a good place to put that too.
2022-03-10 10:05:27 -05:00