Commit Graph

16488 Commits

Author SHA1 Message Date
Ayoub Mrini e7999528fa
fix(test): make TestRemoteWrite_ReshardingWithoutDeadlock more reliable and re-enable it (#17490)
Improve test stability by waiting for the relevant metrics to appear on /metrics before the
first check on the desired shard count.

Increase the scrape interval to avoid timeouts, as 100 ms may be insufficient for Prometheus
to scrape itself in some environments (e.g., CI).

Have Prometheus scrape itself multiple times to increase the volume of data sent and help
fill the queue more quickly.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-11-25 21:30:42 +00:00
George Krajcsovits a66c696530
chore(storage): update docstring (#17609)
The original implementation in #9705 for native histograms included a
technical dept #15177 where samples were committed ordered by type
not by their append order. This was fixed in #17071, but this docstring
was not updated.

I've also took the liberty to mention that we do not order by timestamp
either, thus it is possible to append out of order samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-11-25 15:33:35 +01:00
Faustas Butkus e43f1bafca
chore: fix rangeEval comment (#17607)
Signed-off-by: Faustas Butkus <faustas.butkus@chronosphere.io>
2025-11-25 09:06:30 +00:00
Julius Volz fb47037435
Merge pull request #17602 from ADITYATIWARI342005/fix/closingAutocomplete
[BUGFIX] UI: Fixed codemirror-promql incorrectly showing label completion suggestions
2025-11-24 11:37:38 +01:00
ADITYA TIWARI 02f405692e fix: autocomplete suggestions for using cursor position
Signed-off-by: ADITYA TIWARI <adityatiwari342005@gmail.com>
2025-11-24 09:49:08 +00:00
Bartlomiej Plotka b2eb2bc989
chore(labels): add more context to labels.MetricName deprecation. (#17590)
Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-24 08:31:28 +00:00
Ben Kochie d0d2699dc5
Update Prometheus Agent doc (#17591)
* Add a nav title to fix docs website generator.
* Make it more clear that "Prometheus Agent" is a mode, not a seaparate
  service.
* Add to index.
* Cleanup some wording.
* Add a downsides section.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-21 11:34:19 +01:00
Björn Rabenstein b8d19543b8
Add histogram validation in remote-read and during reducing resolution (#17561)
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.

Furthermore, invalid negative offsets might cause problems, too.

Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)

In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-21 00:22:24 +01:00
Ben Kochie fc27eef43f
Group more dependabot updates (#17563)
Reduce the number of dependabot PRs for related udpdates.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-20 15:36:21 +01:00
Julius Volz 36d054cb2e
UI: Add graph option to start the chart's Y axis at zero (#17565)
To reduce main UI clutter, I added a new settings submenu above the chart
itself for the new setting. So far it only has the one new axis setting, but it
could accommodate further settings in the future.

For now I'm only adding a boolean on/off setting to the UI to set the Y axis to
0 or not. However, the underlying stored URL field is already named
y_axis_min={number} and would support other Y axis minima, in case we want to
support custom values in the UI in the future - but then we'd probably also
want to add an axis maximum and possibly other settings.

Fixes https://github.com/prometheus/prometheus/issues/520

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2025-11-20 14:28:18 +01:00
Björn Rabenstein 5947cc1459
Merge pull request #17559 from prometheus/beorn7/histogram3
model/histogram: Make histogram bucket iterators more robust
2025-11-20 14:07:49 +01:00
Julien 61f64a4cb1
Makefile.common: Use git ls-files instead of find for license check and style check (#17557)
Also improve find fallback to use -prune for better performance.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
2025-11-20 08:56:39 +01:00
beorn7 2dfc324821 model/histogram: Make histogram bucket iterators more robust
Currently, iterating over histogram buckets can panic if the spans are
not consistent with the buckets. We aim for validating histograms upon
ingestion, but there might still be data corruptions on disk that
could trigger the panic. While data corruption on disk is really bad
and will lead to all kind of weirdness, we should still avoid
panic'ing.

Note, though, that chunks are secured by checksums, so the corruptions
won't realistically happen because of disk faults, but more likely
because a chunk was generated in a faulty way in the first place, by
a software bug or even maliciously.

This commit prevents panics in the situation where there are fewer
buckets than described by the spans. Note that the missing buckets
will simply not be iterated over. There is no signalling of this
problem. We might still consider this separately, but for now, I would
say that this kind of corruption is exceedingly rare and doesn't
deserve special treatment (which will add a whole lot of complexity to
the code).

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-19 16:37:51 +01:00
Grégoire 1174b0ce4f
model/textparse: Remove unit validation in protobuf parsing (#16834)
Signed-off-by: Gregoire Verdier <gregoire.verdier@gmail.com>
2025-11-19 14:03:32 +01:00
Björn Rabenstein d943f445f0
Merge pull request #17528 from prometheus/beorn7/histogram
cmd: Make feature flag `native-histograms` a no-op.
2025-11-19 10:42:04 +01:00
Julien e47c7e2f96
Merge pull request #17549 from prometheus/superq/improve_repo_sync_logging
Improve repo sync script logging
2025-11-18 16:24:21 +01:00
Andrew Hall 1193e63896
PromQL: Modify RatioSampler to expose more methods for the benefit of downstream projects (#17516)
Methods added:
- `SampleOffset(metric *labels.Labels) float64` to calculate the sample offset for a given label set.
- `AddRatioSampleWithOffset(ratioLimit, sampleOffset float64) bool` to find out whether a given sample offset falls within a given ratio limit.

The already existing method `AddRatioSample(ratioLimit float64, sample *Sample) bool` is now implemented as a simple combination of the two other methods. Exposing these methods helps downstream projects to re-use the implementations including easier testing.

Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
2025-11-18 15:44:40 +01:00
SuperQ 4bee2c754e
Improve repo sync script logging
Improve the repo sync logging output and add some additional logging.
This should help debugging some failed updates.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-18 11:18:02 +01:00
Minh Nguyen 5087a25848
Remote Write Receive Fix: Remove duplicate labels when type-and-unit-label feature is on (#17546)
* drop extra label from receiver

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* used constant

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-18 09:37:09 +00:00
Bartlomiej Plotka cefefc6897
prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (#17411)
Relates to
https://github.com/prometheus/prometheus/issues/16944#issuecomment-3164760343

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-17 14:59:40 +00:00
Laurent Dufresne d99f8dacc4
chore: remove dead code (#17542)
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-11-17 10:37:55 +01:00
beorn7 be4efd740c cmd: Make feature flag `native-histograms` a no-op.
Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-16 23:22:02 +01:00
Will Bollock 4aa8941eb1
fix(discovery): aws discovery test fix (#17527)
* fix: aws discovery test fix

Fixes a problem introduced after the merge of this https://github.com/prometheus/prometheus/pull/17138

PR didn't take into account another merged PR!

```
discovery/aws/aws.go:218:54: too many arguments in call to NewEC2Discovery
    have (*EC2SDConfig, *slog.Logger, *ec2Metrics)
    want (*EC2SDConfig, discovery.DiscovererOptions)
discovery/aws/aws.go:222:66: too many arguments in call to NewLightsailDiscovery
    have (*LightsailSDConfig, *slog.Logger, *lightsailMetrics)
    want (*LightsailSDConfig, discovery.DiscovererOptions)
```

Signed-off-by: Will Bollock <wbollock@linode.com>

* fix: align ecs style

ECS was a new service discovery tool added after this PR was merged: https://github.com/prometheus/prometheus/pull/17138

Aligns the style of passing a single "opts" to it like almost all the other
service discovery engines now use

Signed-off-by: Will Bollock <wbollock@linode.com>

---------

Signed-off-by: Will Bollock <wbollock@linode.com>
2025-11-16 10:28:50 +00:00
0xkato ae00fd45ab
tsdb: guard chunk length overflow in head chunk reader (#17533)
Signed-off-by: 0xkato <0xkkato@gmail.com>
2025-11-15 21:09:00 +01:00
zenador c64dd612ef
PromQL: Fix bug with inconsistent results for queries with OR expression and EnableDelayedNameRemoval (#17161)
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-11-15 21:07:36 +01:00
Julius Volz 3c53abc9e7
Merge pull request #17537 from rauldsl/prometheus-docs
[Doc] Storage.md - Paragraph Rewrite
2025-11-15 19:02:15 +01:00
Raul Leite 407b697ee2
structure adjusted as reccomended
Corrected the structure of the explanation regarding how samples are organized in the chunks directory and the handling of deletion records.

Signed-off-by: Raul Leite <sp4wn.root@gmail.com>
2025-11-15 10:55:26 -06:00
Raul Leite c9827ef983
Fix formatting in storage.md
extra space removed

Signed-off-by: Raul Leite <sp4wn.root@gmail.com>
2025-11-15 10:52:18 -06:00
Raul Leite e022a727a8 I’ve proposed a slight rewording of this section to improve clarity and readability. (On-Disk Layout Paragraph)
Signed-off-by: Raul Leite <sp4wn.root@gmail.com>
2025-11-14 15:31:21 -06:00
Bryan Boreham 1240402620
Merge pull request #17439 from bboreham/faster-postings
tsdb: couple of postings optimizations
2025-11-14 18:36:34 +01:00
Bryan Boreham b7aae06181
Merge pull request #17114 from bboreham/scrape-stale-by-ref
Scraping: detect staleness via unique reference
2025-11-14 18:32:26 +01:00
Ayoub Mrini 35c3232a2e
test: skip TestRemoteWrite_ReshardingWithoutDeadlock temporarily as flaky (#17534)
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-11-14 08:42:20 +00:00
Julius Hinze 987b28e26c
discovery: fix constructor arguments in aws discovery (#17526)
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-11-13 15:59:14 +00:00
Bartlomiej Plotka f50ff0a40a
feat: rename CreatedTimestamp to StartTimestamp (#17523)
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.

```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st

```

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-13 14:17:51 +00:00
Bartlomiej Plotka 675bafe2fb
Merge pull request #17441 from pipiland2612/refactor_queue_manger
Refactor part of queue_manger.go by creating struct to reuse some common function
2025-11-13 15:07:11 +01:00
Bryan Boreham e02a65b6bd
Merge pull request #17138 from wbollock/feat/prometheus_refresh_config_label
feat(metrics): add config label to refresh metrics
2025-11-13 14:51:39 +01:00
Bartlomiej Plotka e6b6005298
[PERF] PromQL: only reset labels builder when needed (#17524)
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-11-13 14:34:28 +01:00
Linas Medziunas 85150f9dec [PERF] PromQL: only reset labels builder when needed
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-11-13 11:37:53 +02:00
Minh Nguyen 7ebff91cfd
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled (#17472)
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled.

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-13 09:53:12 +01:00
Bryan Boreham f5d1cb48ca
Merge pull request #17519 from bboreham/defer-opname
[PERF] PromQL: Only look up operation name if we need it
2025-11-12 14:20:54 +01:00
Bryan Boreham 37d153e5b5 [PERF] PromQL: Only look up operation name if we need it
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-11-12 12:05:25 +00:00
Bryan Boreham a57aea2915
Improve assertion failure message (#17252)
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Co-authored-by: Charles Korn <charles.korn@grafana.com>
2025-11-12 11:53:32 +01:00
Linas Medžiūnas f330ccaf2f
[PERF] PromQL: eliminate string-keyed maps in binary vector matching (#17131)
In this PR, we are eliminating expensive string-keyed (by signature) maps that are accessed for every sample processed. During preprocessing in rangeEval, we assign a unique number from 0 to n-1 to each of the n string signature values, and later only use this number as a label set signature.

Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-11-12 10:43:05 +00:00
Kevin Hellemun 33082be0e2
feat: add histogram metric for notification_latency_seconds (#16637)
This metric can be used to create alerting based on how many
notifications finish or do not finish within a certain amount of time.

Change-Id: afbf3d8ceb3994c7d6220389353cff92
Signed-Off-By: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Kevin Hellemun <17928966+OGKevin@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-11-11 13:47:37 +01:00
Clark McCauley 9d508a4888
util: add +Inf bucket in MetricFamiliesToWriteRequest when not present (#15864)
* Add +Inf bucket in MetricFamiliesToWriteRequest if not present

Signed-off-by: Clark McCauley <clarkmccauley@gmail.com>
2025-11-11 12:43:04 +01:00
Björn Rabenstein 269a166c18
Merge pull request #17515 from tcp13equals2/update_test_error_comparison
PromQL: Allow for promql tests to consider expected fail message during query preparation
2025-11-11 11:42:59 +01:00
Ben Ye 2e609511bb
Register missing metric prometheus_tsdb_sample_ooo_delta (#17477)
* register missing metric prometheus_tsdb_sample_ooo_delta

Signed-off-by: yeya24 <benye@amazon.com>

* changelog

Signed-off-by: yeya24 <benye@amazon.com>

---------

Signed-off-by: yeya24 <benye@amazon.com>
2025-11-11 11:07:08 +01:00
Andrew Hall cc23e3760d Allow for promql tests to compare expected fail message during query preparation
Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
2025-11-11 17:03:35 +08:00
Bryan Boreham c1e0ab11c6 [PERF] TSDB: Speed up intersectPostings.Next
Check if the next position is already a match, in which case we don't
have to call `Seek`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-11-10 17:00:19 +00:00
Bryan Boreham 0e1e7441e4 [PERF] TSDB: ListPostings: check next item before binary search
It is fairly common that the next item is the one we want, and cheap
to check.

We could also start the binary search one position on, but strangely
that slows it down.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-11-10 17:00:19 +00:00