Commit Graph

75 Commits

Author SHA1 Message Date
Stuart Tettemer ecd13e3f11
Metrics test framework (#101168)
Adds a test framework that validates instruments are registered before they are called and are not double registered.

Also records all invocations of Instruments and allows test authors to add validation to instruments.
2023-10-24 09:05:16 -05:00
Yang Wang aa30dad01f
S3 CAS operation should respect abortMutipartUpload failure (#101253)
We inadvertently made s3 CAS operation to ignore abortMutipartUpload
failures in #98664. This PR fixes it.
2023-10-24 04:37:05 -04:00
David Turner 2757e30010
Make S3 anti-contention delay configurable (#101245)
The anti-contention delay in the S3 repository's compare-and-exchange
operation is hard-coded at 1 second today, but sometimes we encounter a
repository that needs much longer to perform a compare-and-exchange
operation when under contention. With this commit we make the
anti-contention delay configurable.
2023-10-24 08:13:55 +01:00
Stuart Tettemer d8b2c52c82
Metrics refactor - split registry and service (#101154)
This splits out the registry and the service, which makes testing easier and removes much of the delegation from the old `APMMeter` to `Instruments` (now renamed `APMMeterRegistry`).

APMMeterService takes care of the lifecycle and APMMeterRegistry holds the instruments.
2023-10-23 13:28:46 -05:00
David Turner 1eda6ac74b
Extract ESIntegTestCase#prepareSearch (#101175)
Relates #101172
2023-10-20 06:18:58 -04:00
Ryan Ernst 8a1db8c6c3
Move index version constants to IndexVersions (#101094)
Similar to the TransportVersions holder class, IndexVersions is the new
place to contain all constants for IndexVersion. This commit moves all
existing constants to the new class. It is purely mechanical.
2023-10-19 20:44:51 -04:00
Armin Braun bae6991fb3
Remove ~600 references to SearchResponse in tests (#100966)
We'd like to make `SearchResponse` reference counted and pooled but there are around 6k
instances of tests that create a `SearchResponse` local variable that would need to be
released manually to avoid leaks in the tests.
This does away with about 10% of these spots by adding an override for `assertHitCount`
that handles the actual execution of the search request and its release automatically
and making use of it in all spots where the `.get()` on the request build could be inlined
semi-automatically and in a straight-forward fashion without other code changes.
2023-10-17 15:43:36 +02:00
Ievgen Degtiarenko 14952cd7ea
Fix collision in field names (#100940)
Emmited metrics could not be index as elasticsearch.metrics.s3.exceptions field
is both long counter and a parent object for a histogram. This change renames
histogram to avoid the conflict.
2023-10-17 09:40:41 +02:00
Yang Wang 65b4d594ae
Push s3 requests count via metrics API (#100383)
This PR builds on top of #100464 to publish s3 request count via the metrics API.
The metric takes the name of `repositories.requests.count` with 
attributes/dimensions of 
`{"repo_type": "s3", "repo_name": "xxx", "operation": "xxx", "purpose": "xxx"}`.

Closes: ES-6801
2023-10-16 10:01:26 +11:00
David Turner 63030a31cb
Compute repo format version within delete/cleanup (#100714)
Today we rely on the caller computing the appropriate repository format
version based on the nodes in the cluster and the snapshots in (some
recent copy of) the `RepositoryData`. This commit moves that computation
into `createSnapshotsDeletion` so that (a) we can be sure to use the
same `RepositoryData` used for the rest of the process, and (b) we avoid
dispatching work to the SNAPSHOT pool twice.

Relates a comment on #100657
2023-10-12 02:23:07 -04:00
Ievgen Degtiarenko 77b4fd12bf
Log aws request metrics (#100272)
This change logs amount of requests, exceptions and throttles from 
aws s3 api aggregated over a tumbling window.
2023-10-10 17:16:04 +02:00
David Turner 969fd2acbf
Snapshot deletion process cleanups (#100568)
Reorders the methods involved in snapshot deletion to be closer together
and better match the flow of execution, and harmonises the names of many
parameters and local variables to make it easier to follow them through
the process.
2023-10-10 13:39:58 +01:00
Yang Wang 525fe59ee2
Make APM meter available in s3 blobstore (#100464)
This PR wires the new Meter interface into S3BlobStore. The new meter
field remains unused in this PR. Actual metric collection will be
addressed in follow-ups.

Relates: ES-6801
2023-10-09 06:01:20 -04:00
Yang Wang a4db40d89c
Record operation purpose for s3 stats collection (#100236)
A new no-op OperationPurpose parameter is added in #99615 to all blob
store/container operation method. This PR updates the s3 stats
collection code to actually use this parameter for finer grained stats
collection and reports. This differentiation between purposes are kept 
internally for now. The stats are currently aggregated over operations for 
existing stats reporting. This means responses from both 
GetRepositoriesMetering API and GetBlobStoreStats API will not be changed. 
We will have follow-ups to expose the finer stats separately.

Relates: #99615
Relates: ES-6800
2023-10-09 19:55:31 +11:00
Armin Braun b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Yang Wang 5628392fa5
Differentiate stats for the same blobstore operation with purposes (#99615)
Today blobstore stats are collected against each HTTP operation, e.g.
Get, List. This is not granular enough because the same HTTP operration
can be performed for different purposes, e.g. cluster state, indices or
translog. This PR adds a new Purpose enum to provide further breakdown
for the same HTTP operation. 

Relates: ES-6800
2023-10-02 06:37:08 -04:00
Przemyslaw Gomulka eca41871aa
Use TelemetryProvider in Plugin::createComponents (#99737)
in order to avoid adding yet anther parameter to createComponents
a Tracer interface is replaced with TelemetryProvider.
this allows to get both Tracer and Metric (in the future) interfaces
2023-09-22 14:48:11 +02:00
Przemyslaw Gomulka b6747b48ba
Rename tracing to telemetry package (#99710)
This commit renames the tracing to telemetry.tracing in both xpack/APM and elasticserach's org.elasticsearch.tracing.Tracer (the api)
the xpack/APM is renamed as follows:
org.elasticsearch.telemetry.apm - the only exported package
org.elasticsearch.telemetry.apm.settings - APMSettings
org.elasticsearch.telemetry.apm.tracing - APMTracer

org.elasticsearch.tracing.Tracer is moved to org.elasticsearch.telemetry.tracing.Tracer (responsible for majority of the changes in this PR)
2023-09-20 16:58:02 +02:00
David Turner 4ee229779b
Clean up delete code in S3BlobContainer (#99447)
Simplifies things using utils from `Iterators` that didn't exist when
the code was first written.
2023-09-12 07:16:39 +01:00
David Turner 082f36578d
Remove string-based scheduleUnlessShuttingDown (#99131)
Relates #99051, #99027
2023-09-04 03:31:58 -04:00
Simon Cooper e1f353c2cf
Convert even more index created version to IndexVersion (#99088) 2023-08-31 13:08:26 +01:00
David Turner a20ee3f8f2
Migrate simple usages of ThreadPool#schedule (#99051)
In #99027 we deprecated the string-based version of
`ThreadPool#schedule`. This commit migrates all the simple usages of
this API to the new version.
2023-08-31 07:37:31 +01:00
Francisco Fernández Castaño f6a2b5c9ef
Add bulk delete method to BlobStore interface and implementations (#98948) 2023-08-29 12:25:03 +02:00
David Turner e4af2bfe92
Add TTL on S3 CAS uploads (#98664)
Compare-and-swap operations on a S3 repository are implemented using
multipart uploads. Today to try and avoid collisions we refuse to
perform a compare-and-swap if there are other concurrent uploads in
process. However this means that a node which crashes partway through a
compare-and-swap will block all future register operations.

With this commit we introduce a time-to-live on S3 multipart uploads,
such that uploads older than the TTL now do not block future
compare-and-swap attempts.
2023-08-23 06:51:58 +01:00
Yang Wang 93ba27697e
Collect additional object store stats for S3 (#98083)
This PR adds additional stats collectiosn for s3, including Delete and
Abort. It also fixes an issue where ListNextBatchObject is not metered.
2023-08-03 06:31:04 -04:00
Simon Cooper 55cf37cedd
Migrate Snapshot repository version to IndexVersion (#97226) 2023-07-04 11:42:46 +01:00
Simon Cooper 5486667d73
Convert snapshot version to IndexVersion (#96857) 2023-06-28 16:04:19 +01:00
Ievgen Degtiarenko d9b6c5ae29
Wire IndicesService to plugins (#97081)
This change exposes IndicesService to the plugins via Plugin#createComponents
2023-06-27 18:02:23 +02:00
Armin Braun dd7d381922
Dry up getting cluster admin client in tests (#96952)
Drying this up further and adding the same short-cut for single node
tests. Dealing with most of the spots that I could grab via automatic
refactorings.
2023-06-22 14:27:23 +02:00
Tim Brooks ac829edc55
Enable skip methods on retrying inputstreams (#96337)
Currently we have a number of input streams that specifically override
the skip() method disabling the ability to skip bytes. In each case the
skip implementation works as we have properly implemented the
read(byte[]) methods used to discard bytes. However, we appear to have
disabled it as it would be possible to retry from the end of a skip if
there is a failure in the middle. At this time, that optimization is not
really necessary, however, we sporadically used skip so it would be nice
for the IS to support the method. This commit enables the super.skip()
and adds a comment about future optimizations.
2023-05-25 10:11:27 -06:00
David Turner 9761089698
Fix NPE in S3BlobContainer (#96168)
Relates #96019 Closes #96162
2023-05-16 11:36:12 -04:00
David Turner 350beea181
Arbitrary bytes in blob store register (#96019)
Today the blob store register supports recording only a `long`,
represented as an 8-byte blob. We need to store a little more data in
the register, so this commit generalises things to work with a
`BytesReference` directly.
2023-05-16 06:16:21 -04:00
Rene Groeschke 44cc172219
Update Gradle wrapper to 8.1 (#94663)
- Udpate docker compose plugin to use 8.1 compliant version
- Fix deprecations of test task configurations
2023-04-13 16:11:51 +02:00
Mark Vieira cbc73a7665
Register test artifacts for service-account security QA project (#94602) 2023-03-21 12:15:05 -07:00
David Turner 49d5cd7f26
S3 compare-and-exchange implementation (#94150)
Adds an implementation of `compareAndExchangeRegister` to
`S3BlobContainer`.
2023-02-28 06:44:15 -05:00
David Turner 95daf492fc
Async blob-store compare-and-exchange API (#94092)
Further work towards the S3 compare-and-exchange implementation showed
that we would like this API to permit async operations. This commit
moves to an async API.

Also, this change made it fairly awkward to use an exception to deliver
to the caller the indication that the current value could not be read,
so this commit adjusts things to use `OptionalLong` throughout as
suggested in the discussion on #93955.
2023-02-27 08:41:34 +00:00
Armin Braun a6f63df111
Introduce BlobStoreRepository CAS Mechanism (#93825)
Only for testing purposes through the `FsRepository` for now and rather simple,
but should get the job done and technically be correct for a compliant NFS implementation.

Co-authored-by: David Turner <david.turner@elastic.co>
2023-02-16 14:26:12 +00:00
Joe Gallo 582f1be95e
Update log4j2 LICENSE and NOTICE files (#93611) 2023-02-09 08:53:43 -05:00
Armin Braun f2760c6e18
Nicer buffer handling (#93491)
Some optimisations that I found when reusing searchable snapshot code elsewhere:
* Add an efficient input stream -> byte buffer path that avoids allocations + copies for heap buffers, this is non-trivial in its effects IMO
  * Also at least avoid allocations and use existing thread-local buffer when doing input stream -> direct bb
  * move `readFully` to lower level streams class to enable this
* Use same thread local direct byte buffer for frozen and caching index input instead of constantly allocating new heap buffers and writing those to disk inefficiently
2023-02-06 10:55:56 +01:00
Ievgen Degtiarenko ad229dd70e
Update createComponents to supply AllocationService instead of AllocationDeciders (#92785) 2023-01-10 14:18:33 +01:00
Artem Prigoda 2bc7398754
Use `Strings.format` instead of `String.format(Locale.ROOT, ...)` in tests (#92106)
Use local-independent `Strings.format` method instead of `String.format(Locale.ROOT, ...)`. 
Inline `ESTestCase.forbidden` calls with `Strings.format` for the consistency sake.
Add `Strings.format` alias in `common.Strings`
2023-01-03 19:28:27 +01:00
Mark Vieira c2eda511de
Add JUnit rule based integration test cluster orchestration framework (#92379)
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
2022-12-21 15:33:46 -08:00
Yang Wang b22719844d
Add getRandom method to BuildParams for convenience (#91674)
It should help with reducing the ceremonies needed for getting a
reproducible random value (mostly boolean) in build.gradle files.

Relates: https://github.com/elastic/elasticsearch/pull/91536#discussion_r1026075192
2022-11-19 13:26:54 +11:00
Armin Braun 362a7f0a95
Make some ByteSizeValue instances always use the singleton (#91178)
This comes out of a user heap dump investigation. In some snapshot
corner cases we ran into about 100M of duplicate 0b instances.

-> even though it's a little heavy handed, lets make it so the common
constants that we already have are used whenever possible.
2022-10-28 16:53:49 +02:00
Rene Groeschke 43a0377735
Update forbiddenapis to 3.4 (#90624)
Fix breaking changes to source validation after change in default jdk rule set
2022-10-06 16:52:06 +02:00
Armin Braun 97c533a562
Increase snaphot pool max size to 10 (#90282)
As discussed, we can be up to twice as fast without increasing CPU use
much on high latency blob stores so increasing the pool size to 10 here
to better utilize larger data nodes.
2022-09-23 17:06:57 +02:00
Artem Prigoda db359d9693
Log unsuccessful attempts to get credentials from web identity tokens (#88241)
Currently, we only verify that local environment for web identity tokens is correctly set up, but we don't verify whether it's
possible to exchange the token to credentials from the STS. If we can't get credentials from the STS, we silently fall back
to the EC2 credentials provider. Let's try to log the web identity token auth errors, so the users get a clear message in the logs in case the STS is unavailable for the ES server.
2022-09-08 20:34:19 +02:00
Francisco Fernández Castaño 7a07853965
Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (#89912)
Relates #88841
2022-09-08 16:49:45 +02:00
Nikola Grcevski fc819609a1
Add allocation deciders in createComponents (#89836)
With this change we are adding the allocation deciders
in create components we can simplify the use in the
Autoscaling plugin and implement reserved state handler
in the future.
2022-09-07 09:28:07 -04:00
Artem Prigoda 9b459a25c8
[repository-s3] Update the AWS SDK to 1.12.270 (#88932)
* Update commons-code to 1.15
* Add exceptions for unused classes
2022-09-05 10:21:44 +02:00