Improve test stability by waiting for the relevant metrics to appear on /metrics before the
first check on the desired shard count.
Increase the scrape interval to avoid timeouts, as 100 ms may be insufficient for Prometheus
to scrape itself in some environments (e.g., CI).
Have Prometheus scrape itself multiple times to increase the volume of data sent and help
fill the queue more quickly.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
The original implementation in #9705 for native histograms included a
technical dept #15177 where samples were committed ordered by type
not by their append order. This was fixed in #17071, but this docstring
was not updated.
I've also took the liberty to mention that we do not order by timestamp
either, thus it is possible to append out of order samples.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.
Signed-off-by: SuperQ <superq@gmail.com>
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
To reduce main UI clutter, I added a new settings submenu above the chart
itself for the new setting. So far it only has the one new axis setting, but it
could accommodate further settings in the future.
For now I'm only adding a boolean on/off setting to the UI to set the Y axis to
0 or not. However, the underlying stored URL field is already named
y_axis_min={number} and would support other Y axis minima, in case we want to
support custom values in the UI in the future - but then we'd probably also
want to add an axis maximum and possibly other settings.
Fixes https://github.com/prometheus/prometheus/issues/520
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Currently, iterating over histogram buckets can panic if the spans are
not consistent with the buckets. We aim for validating histograms upon
ingestion, but there might still be data corruptions on disk that
could trigger the panic. While data corruption on disk is really bad
and will lead to all kind of weirdness, we should still avoid
panic'ing.
Note, though, that chunks are secured by checksums, so the corruptions
won't realistically happen because of disk faults, but more likely
because a chunk was generated in a faulty way in the first place, by
a software bug or even maliciously.
This commit prevents panics in the situation where there are fewer
buckets than described by the spans. Note that the missing buckets
will simply not be iterated over. There is no signalling of this
problem. We might still consider this separately, but for now, I would
say that this kind of corruption is exceedingly rare and doesn't
deserve special treatment (which will add a whole lot of complexity to
the code).
Signed-off-by: beorn7 <beorn@grafana.com>
Methods added:
- `SampleOffset(metric *labels.Labels) float64` to calculate the sample offset for a given label set.
- `AddRatioSampleWithOffset(ratioLimit, sampleOffset float64) bool` to find out whether a given sample offset falls within a given ratio limit.
The already existing method `AddRatioSample(ratioLimit float64, sample *Sample) bool` is now implemented as a simple combination of the two other methods. Exposing these methods helps downstream projects to re-use the implementations including easier testing.
Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Improve the repo sync logging output and add some additional logging.
This should help debugging some failed updates.
Signed-off-by: SuperQ <superq@gmail.com>
* drop extra label from receiver
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* used constant
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix: aws discovery test fix
Fixes a problem introduced after the merge of this https://github.com/prometheus/prometheus/pull/17138
PR didn't take into account another merged PR!
```
discovery/aws/aws.go:218:54: too many arguments in call to NewEC2Discovery
have (*EC2SDConfig, *slog.Logger, *ec2Metrics)
want (*EC2SDConfig, discovery.DiscovererOptions)
discovery/aws/aws.go:222:66: too many arguments in call to NewLightsailDiscovery
have (*LightsailSDConfig, *slog.Logger, *lightsailMetrics)
want (*LightsailSDConfig, discovery.DiscovererOptions)
```
Signed-off-by: Will Bollock <wbollock@linode.com>
* fix: align ecs style
ECS was a new service discovery tool added after this PR was merged: https://github.com/prometheus/prometheus/pull/17138
Aligns the style of passing a single "opts" to it like almost all the other
service discovery engines now use
Signed-off-by: Will Bollock <wbollock@linode.com>
---------
Signed-off-by: Will Bollock <wbollock@linode.com>
Corrected the structure of the explanation regarding how samples are organized in the chunks directory and the handling of deletion records.
Signed-off-by: Raul Leite <sp4wn.root@gmail.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled.
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
In this PR, we are eliminating expensive string-keyed (by signature) maps that are accessed for every sample processed. During preprocessing in rangeEval, we assign a unique number from 0 to n-1 to each of the n string signature values, and later only use this number as a label set signature.
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
This metric can be used to create alerting based on how many
notifications finish or do not finish within a certain amount of time.
Change-Id: afbf3d8ceb3994c7d6220389353cff92
Signed-Off-By: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
---------
Signed-off-by: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
It is fairly common that the next item is the one we want, and cheap
to check.
We could also start the binary search one position on, but strangely
that slows it down.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>