This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing.
The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large `RepositoryData` (and limiting us to `2G` max blob size).
With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation.
As we move to larger repositories this makes master node stability a lot more predictable since writing out `RepositoryData` does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on `RepositoryData`.
Change the formatter config to sort / order imports, and reformat the
codebase. We already had a config file for Eclipse users, so Spotless now
uses that.
The "Eclipse Code Formatter" plugin ought to be able to use this file as
well for import ordering, but in my experiments the results were poor.
Instead, use IntelliJ's `.editorconfig` support to configure import
ordering.
I've also added a config file for the formatter plugin.
Other changes:
* I've quietly enabled the `toggleOnOff` option for Spotless. It was
already possible to disable formatting for sections using the markers
for docs snippets, so enabling this option just accepts this reality
and makes it possible via `formatter:off` and `formatter:on` without
the restrictions around line length. It should still only be used as
a very last resort and with good reason.
* I've removed mention of the `paddedCell` option from the contributing
guide, since I haven't had to use that option for a very long time. I
moved the docs to the spotless config.
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Use an iterator instead of a list when passing around what to delete.
In the case of very large deletes the iterator is a much smaller than
the actual list of files to delete (since we save all the prefixes
which adds up if the individual shard folders contain lots of deletes).
Also this commit as a side-effect adjusts a few spots in logging where the
log messages could be catastrophic in size when trace logging is activated.
There should be a singleton for the empty version of this.
All the copying to `String[]` or use as an iterator make
no sense either when we can just use the list outright.
Today we do not specify the `?seed=` parameter when running the
repository analyzer in REST tests, so we cannot reproduce the set
of operations that led to a failure. This commit introduces a
deterministic value for this parameter.
Relates #72358 which seems to indicate some kind of bug in how certain
checksums are calculated in the test fixtures.
We rely on the repository implementation correctly handling the case where a
write is aborted before it completes. This is not guaranteed for third-party
repositories.
This commit adds a rare action during analysis which aborts the write
just before it completes and verifies that the target blob is not found
by any node.
This commit moves the data tier roles to server. It is no longer
necessary to separate these roles from server as we no longer build
distributions that would not contain these roles. Moving these roles
will simplify many things. This is deliberately the smallest possible
commit that moves these roles. Other aspects related to the data tiers
can move in separate, also small, commits.
Today we leniently permit overwrites of blobs in a repository not to be
atomic, since they are not in shared filesystem repositories. In fact
it's worse, on Windows overwrites do not even work if there is a
concurrent reader. In practice this isn't very important, we do almost
no overwrites and almost never read the file that's being overwritten,
but we do still test for atomic overwrites in the repository analyzer.
With this commit we suppress the atomic overwrite checks in the
repository analyzer for FS repositories, and remove the lenience since
all other repositories should implement atomic overwrites correctly.
Closes#70303
The repository analyzer does not write empty blobs to the repository,
and we assert that the blobs are nonempty, but the tests randomly check
for the empty case anyway. This commit ensures that the blobs used in
tests are always nonempty.
Relates #67247
This test would occasionally generate an invalid request in which the
max total data size cannot be satisfied. This commit ensures the max
total data size is always sufficiently large.
Relates #67247Closes#69219
Today blob overwrites are not completely atomic, there is an
intermediate state in which the blob is missing, but the repository
analyser does not fully account for that intermediate state.
This commit relaxes the check for missing blobs if they were
overwritten.
Closes#69087
Today we rely on blob stores behaving in a certain way so that they can be used
as a snapshot repository. There are an increasing number of third-party blob
stores that claim to be S3-compatible, but which may not offer a suitably
correct or performant implementation of the S3 API. We rely on somesubtle
semantics with concurrent readers and writers, but some blob stores may not
implement it correctly. Hitting a corner case in the implementation may be rare
in normal use, and may be hard to reproduce or to distinguish from an
Elasticsearch bug.
This commit introduces a new `POST /_snapshot/.../_analyse` API which exercises
the more problematic corners of the repository implementation looking for
correctness bugs and measures the details of the performance of the repository
under concurrent load.