Commit Graph

16027 Commits

Author SHA1 Message Date
Ayoub Mrini 9cbb3a66c9
Merge pull request #17063 from machine424/muttt
test(notifier): add a test showing an alert mutation bug between alertmanager_config and fix it
2025-08-27 11:13:00 +02:00
pipiland2612 0246aa22f4 Parralel test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-27 10:47:39 +02:00
Darkknight 9fc4212214
revert unexpected metadata metric fopr RWV2 and add log on unexpected metadata instead. (#17082)
Signed-off-by: leegin <leegin.t@gmail.com>
2025-08-26 11:54:14 -07:00
bragi92 20580b6ba8
remote_write azure auth : add workload identity support (#16788)
* initial changes

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* .

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix comments

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix tenantid test

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* style

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* pr feedback

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

---------

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-08-26 07:14:47 +01:00
machine424 8f79470ca9
fix(notifier): create a new alert when relabeling alters labels
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-26 07:37:45 +02:00
machine424 bd725fd6b8
test(notifier): add a test showing an alert mutation bug between alertmanager_config (alertmanagersets)
The alert_relabel_configs should only apply to the corresponding alertmanagerset

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-25 17:04:14 +02:00
Darkknight 7cf585527f
remote_write: add metric for unexpected metadata in populateV2TimeSeries (#17034)
add metric to track unexpected metadata seen in populateV2TimeSeries, which would indicate metadata incorrectly routed in queue_manager code paths

---------

Signed-off-by: leegin <leegin.t@gmail.com>
Signed-off-by: Darkknight <leegin.t@gmail.com>
2025-08-22 10:33:52 -07:00
Minh Nguyen c8deefb038
[tsdb] Add CounterResetHint: CounterReset to synthetic zero sample (#17011)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-21 23:26:01 +02:00
Marco Pracucci 954cad35b2
Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE (#17064)
* Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Further optimised the case of ALERTS and ALERTS_FOR_STATE without alertname label matcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2025-08-21 16:57:57 +02:00
Bryan Boreham 8b3f59e9c3
Merge pull request #16593 from bboreham/ast-child-iter
[PERF] PromQL: Reduce allocations when walking syntax tree
2025-08-21 09:14:41 +01:00
Björn Rabenstein 7e37066994
Merge pull request #17051 from juliusmh/nh_counter_reset_hint_collision_annotation
Histograms: set annotation when adding or subtracting histograms that have `not_reset` and `reset` hints.
2025-08-20 17:24:12 +02:00
Julius Hinze cdf7208478
annotations: histogram counter reset warning includes operation
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:14:21 +02:00
Julius Hinze 77b5c3f217
Histograms: set annotation when adding or subtracting histograms that have `not_reset` and `reset` hints.
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:00:45 +02:00
Bryan Boreham 498f63e60b
Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples
TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints
2025-08-20 11:15:10 +01:00
Bartlomiej Plotka 5dc3c976b4
Merge pull request #17061 from prometheus/not-parallel
[TESTS] remote-write: Make TestShutdown non-parallel to reduce flakes.
2025-08-20 09:03:45 +01:00
Ganesh Vernekar a86d9a3858
Merge pull request #16925 from prometheus/codesome/stale-series-tracking
tsdb: Track stale series in the Head block based on stale sample
2025-08-19 15:35:19 -07:00
Patryk Prus bbc9e47e42
Add comment about differences between agent mode and regular Prometheus
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-19 18:33:52 -04:00
Ganesh Vernekar 3904b3cd5f Restore stale series count from chunk snapshots
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar b29ce3e489 Restore stale series count on WAL replay
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 0c3d3d7466 Test the stale series tracking in Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 7a947d3629 Track stale series in the Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:27 -07:00
Bryan Boreham a3c4a9bd18 [TESTS] remote-write: Make TestShutdown non-parallel to reduce flakes.
Resolves #17045.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-19 18:04:20 +01:00
Justin Jung 0f98dcbc07
Engine: Allow error response code to be customized (#16257)
Currently the API always returns http code 422 for engine execution error, and

This PR allows the error code to be overriden, based on the ErrorType and the error itself.

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <justinjung04@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-08-19 16:43:47 +01:00
Bartlomiej Plotka 93bbf4bc90
Merge pull request #17041 from bernot-dev/remove-queue-manager-startup-benchmark
test: remove obsolete queue manager test
2025-08-18 17:06:39 +01:00
Arve Knudsen 0a40df33fb
Make metric/label name validation scheme explicit (#16928)
* Parameterize metric/label name validation scheme

Parameterized metric/label name validation scheme

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-18 08:09:00 +00:00
Arve Knudsen 68d0d3eee3
Remote write: Return after writing error response for invalid compression (#17050)
* Remote write: Return after writing error response for invalid compression

Fix remote write HTTP handler to return after writing error response for
invalid compression (non-Snappy).

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-08-17 15:18:47 +00:00
Bryan Boreham af588174b0
Merge pull request #17052 from vicwicker/fix-chunk-encoding-docs
docs: Fix chunk format documentation for `varint` encoding
2025-08-17 09:35:19 +01:00
Adam Bernot 575a60ec92
test: fix flaky test
A race condition in TestSendSamplesWithBackoffWithSampleAgeLimit was
observed in CI where the sample age limit was too close to the backoff
time, causing samples to be dropped intermittently. Increasing the
SampleAgeLimit resolves the problem.

Signed-off-by: Adam Bernot <bernot@google.com>
2025-08-15 18:58:27 +00:00
Björn Rabenstein e8f650b00c
Merge pull request #17049 from prometheus/beorn7/doc
docs: counter vs. gauge histogram behavior with `+`/`-`
2025-08-15 11:19:51 +02:00
Victor Herrero Otal 0cbbc9b7d3 docs: Fix chunk format documentation for `varint` encoding
While preparing PR #16701, we identified an inconsistency in the chunk
format documentation. The `varint` encoding can require up to 10 bytes
for a 64-bit integer, such as when timestamps are encoded. However, the
chunk length field is a 32-bit integer, which requires at most 5 bytes
in `varint` encoding.

This is reflected in the code, where a maximum of 5 bytes are read when
parsing the chunk length.

    50ba25f273/tsdb/chunks/chunks.go (L709-L711)

    50ba25f273/tsdb/chunks/chunks.go (L47-L48)

Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
Signed-off-by: Victor Herrero Otal <victor.herrero.otal@sap.com>
2025-08-15 10:56:21 +02:00
Björn Rabenstein 1c002c5669
Merge pull request #17048 from prymitive/parserErr
ENHANCEMENT: Refactor TestParseExpressions to be more explicit about errors
2025-08-14 16:04:16 +02:00
Dimitar Dimitrov d94dab92a8
remote.ReadClient: allow multiple queries (#16742)
* remote read: simplify ReadMultiple to return single SeriesSet

Changed ReadMultiple to return a single SeriesSet with interleaved
series from all queries instead of a slice of SeriesSets. This
simplifies the interface and removes the complex multiplexing
infrastructure while maintaining the ability to send multiple
queries in a single HTTP request.

Changes:
- Updated ReadClient interface: ReadMultiple now returns storage.SeriesSet
- Removed multiplexing infrastructure (MessageQueue, QueueConsumer, etc.)
- Simplified response handling to interleave series from all queries
- Updated tests to match new interface
- All existing tests pass

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Fix sorting behavior in ReadMultiple for samples responses

When sortSeries=false, the previous implementation incorrectly used
storage.NewMergeSeriesSet which requires sorted inputs, violating the
function's contract and potentially producing incorrect results.

Changes:
- When sortSeries=true: Use NewMergeSeriesSet for efficient merging and
  deduplication of sorted series
- When sortSeries=false: Use simple concatenation to avoid the sorted
  input requirement, preserving duplicates from overlapping queries
- Add comprehensive tests to verify both sorting behaviors
- Update existing test expectations to match correct sorted order

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Refactor to reduce code duplication in ReadMultiple implementation

Extract common query result combination logic into a shared
combineQueryResults function that handles both sorted and unsorted
cases. This eliminates duplication between the real client
implementation and the mock client used in tests.

Changes:
- Add combineQueryResults helper function in client.go
- Refactor handleSamplesResponseImpl to use the helper
- Simplify mockedRemoteClient.ReadMultiple to use the same helper
- Reduce code duplication by ~30 lines while maintaining same functionality


Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-08-14 15:00:07 +02:00
Lukasz Mierzwa 7e22d2e5c0 Refactor TestParseExpressions to be more explicit about errors
Right now TestParseExpressions tests if a query returns an error but it only does a fuzzy check on returned errors.
The error returned by the parser is ParseErrors, which is a slice of ParseErr structs.
The Error() method on ParseErrors will return an error string based on the first error in that slice. This hides other returned errors so we can end up with bogus errors being returned but won't ever find this via this test.
This change makes the test compare returned error (which is always ParseErrors type) with expected ParseErrors slice.
The extra benefit of this is that current tests mostly ignore error positional range and only test for correct error message. Now errors must return expected positional information.
There are a few cases uncovered where the positional informatio of errors seems wrong, added FIXME for these lines.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-14 10:01:44 +01:00
Björn Rabenstein 17d5d80c80
Merge pull request #17047 from shk1999/docs
chore(configuration.md): fix unit_testing_rules typo
2025-08-13 19:44:38 +02:00
beorn7 1071c82b42 docs: counter vs. gauge histogram behavior with `+`/`-`
This mostly handles the cases mentioned in #16576. However, there are
some related changes in here, too:

- Some line formatting to avoid lines longer than 80 characters.

- Establish in basics.md that histograms have a counter vs. gauge
  "flavor" that is also stored in the sample and not just by
  convention as for float samples.

- Add the documentation of the unary minus, which was missing so far.
  This require a bit of restructuring.

- Cleaned up a few references to "Prometheus" that should better refer
  to "PromQL" (and "Prometheus's query language" → "PromQL" etc.).

I decided to not explain in all detail when and how PromQL detects an
incompatible counter reset. The spec is linked from basics.md, so the
minority that might be interested in this can still look it up.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-13 19:25:17 +02:00
Lukasz Mierzwa 7b308dc7fe Add a note about PositionRange values
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-13 17:57:40 +01:00
Neeraj Gartia 2c0de4e7c2
Fix `histogram_quantile` annotation in range query when delayed name removal is disabled (#16794)
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2025-08-13 18:06:48 +02:00
Björn Rabenstein 4217d4ba46
Merge pull request #17046 from prometheus/beorn7/promql
promqltest: Add test for unary minus with native histograms
2025-08-13 16:57:09 +02:00
George Krajcsovits 2aaeefe8e1
fix(parser): wrong end position aggregate expression (#17031)
* fix(parser): wrong end position aggregate expression

Fixes: https://github.com/prometheus/prometheus/issues/16053

The position range of nested aggregate expression was wrong, for the
expression "(sum(foo))" the position of "sum(foo)" should be 1-9, but
the parser could not decide the end of the expression on pos 9, instead
it read ahead to pos 10 and then emitted the aggregate. But we only
kept the last closing position (10) and wrote that into the aggregate.

The reason for this is that the parser cannot know from "(sum(foo)" alone
if the aggregate is finished. It could be finished as in "(sum(foo))" but
equally it could continue with group modifier as "(sum(foo) by (bar))".

Previous fix in #16041 tried to keep track of parenthesis, but that is
complicated because the error happens after closing two parenthesis. That
fix introduced new bugs.

This fix now addresses the issue directly. Since we have to step outside
the parser state machine anyway, we can just add an algorithm to
detect and fix the issue. That's Lexer.findPrevRightParen().

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-08-13 15:52:52 +02:00
beorn7 e326049e43 promqltest: Add test for unary minus with native histograms
This verifies that a counter histogram becomes a gauge histogram if an
unary minus is applied to it.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-13 15:39:52 +02:00
Ayoub Mrini dd6ad8ec4c
feat: add a way to pass release notes from the PR (#16904)
* feat: add a way to add release notes from the PR

make the release note block part of .github/PULL_REQUEST_TEMPLATE.md (inspired from k8s')

A CI check would check the input.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* imp

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* suggestions

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-13 15:34:36 +02:00
Björn Rabenstein caf24555a8
Merge pull request #17004 from juliusmh/nh_unary_minus_as_gauge
model: set native histogram GaugeType hint when subtracting or multiplying/dividing with negative factors
2025-08-13 15:25:06 +02:00
shk1999 6851b1095c chore(configuration.md): fix unit_testing_rules typo
Signed-off-by: shk1999 <sh.karimi@vasl.ir>
2025-08-13 16:48:28 +03:30
Bryan Boreham 8fff489c53
Merge pull request #16896 from bboreham/wrap-less
[PERF] PromQL: Move more work to preprocessing step
2025-08-13 14:17:30 +01:00
Bryan Boreham 7ab68550dc [PERF] PromQL: Unwrap superfluous parens during preprocessing
This means we only do it once, rather than on every step of a range
query. Also the code gets a bit shorter.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-13 14:01:47 +01:00
Bryan Boreham 384db72ede [PERF] PromQL: Stop checking step-invariant arguments
In aggregations and function calls. We no longer wrap the literal values
or matrix selectors that would appear here.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-13 14:01:47 +01:00
Bryan Boreham 94d3cac4ea [PERF] PromQL: Don't wrap matrix selectors as time-invariant
Matrix selectors have a Timestamp which indicates they are time-invariant,
so we don't need to wrap and then unwrap them when we come to use them.

Fix up tests that check this level of detail.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-13 14:01:47 +01:00
Bryan Boreham b290e0ec17 [PERF] PromQL: Don't wrap constant expressions as time-invariant
This should mean we can stop unwrapping them later.

Fix up tests that check very specific details.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-13 10:54:40 +01:00
Julius Hinze 5855d973b0
model: set native histogram GaugeType hint when subtracting or multiplying/dividing with negative factors
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-12 18:16:39 +02:00
Adam Bernot 8cf67d99ba
test: remove obsolete test
As mentioned in #16182, the BenchmarkStartup test for the queue manager
covers an old API and uses settings that will not occur in production

Signed-off-by: Adam Bernot <bernot@google.com>
2025-08-12 15:36:07 +00:00