The current code stops the walk after we have found the first relevant
function. However, in expressions with multiple legs, we will then use
the `HistogramStatsIterator` at most once. This change should make
sure we explore all legs.
The added tests make sure we are not using `HistogramStatsIterator`
where we shouldn't (but the opposite can only be seen in a benchmark
or with a more explicit test).
Signed-off-by: beorn7 <beorn@grafana.com>
Prior to #17127, we needed to add another level in the AST to trigger
the usage of `HistogramStatsIterator`. This is fixed now.
Signed-off-by: beorn7 <beorn@grafana.com>
Previously, multiple calls returned a wrong counter reset hint.
This commit also includes a bunch of refactorings that partially have
value on their own. However, the need for them was triggered by the
additional work needed for idempotency, so I included them in this
commit.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(textparse): implement NHCB parsing in ProtoBuf parser directly
The NHCB conversion does some validation, but we can only return error
from Parser.Next() not Parser.Histogram(). So the conversion needs to
happen in Next().
There are 2 cases:
1. "always_scrape_classic_histograms" is enabled, in which case we
convert after returning the classic series. This is to be consistent
with the PromParser text parser, which collects NHCB while spitting out
classic series; then returns the NHCB.
2. "always_scrape_classic_histograms" is disabled. In which case we never
return the classic series.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* refactor(textparse): skip classic series instead of adding NHCB around
Do not return the first classic series from the EntryType state,
switch to EntrySeries. This means we need to start the histogram
field state from -3 , not -2.
In EntrySeries, skip classic series if needed.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* reuse nhcb converter
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* test(textparse/nhcb): test corrupting metric family name
NHCB parse doesn't always copy the metric name from the underlying
parser. When called via HELP, UNIT, the string is directly referenced
which means that the read-ahead of NHCB can corrupt it.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* OTLP writer writes directly to appender
Do not convert to Remote-Write 1.0 protocol. Convert to TSDB Appender interface instead.
For downstream projects that still convert OTLP to something else (e.g. Mimir using
its own RW 1.0+2.0 compatible protocol), introduce a compatibility layer between
OTLP decoding and TSDB Appender. This is the CombinedAppender that hides the
implementation. Name is subject to change.
---------
Signed-off-by: David Ashpole <dashpole@google.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Hide adding NHCB parser on top another parser in New() function
so we can easily add direct NHCB capable parsers.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Remote Write one currently attempts to send native histograms with
custom buckets, but these are not actually supported in RW1 protocol.
Drop, measure and log instead.
Fixes: #17140
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
After an effective Seek, the lastFH isn't the lastFH anymore, so we
should nil it out.
In practice, this should only matter is sub-queries, because we are
otherwise not interested in a counter reset of the first sample
returned after a Seek.
Sub-queries, on the other hand, always do their own counter reset
detection. (For that, they would prefer to see the whole histogram, so
that's another problem for another commit.)
Signed-off-by: beorn7 <beorn@grafana.com>
The `HistogramStatsIterator` is only meant to be used within PromQL.
PromQL only ever uses float histograms. If `HistogramStatsIterator` is
capable of handling integer histograms, it will still be used, for
example by the `BufferedSeriesIterator`, which buffers samples and
will use an integer `Histogram` for it, if the underlying chunk is an
integer histogram chunk (which is common).
However, we can simply intercept the `Next` and `Seek` calls and
pretend to only ever be able te return float histograms. This has the
welcome side effect that we do not have to handle a mix of float and
integer histograms in the `HistogramStatsIterator` anymore.
With this commit, the `AtHistogram` call has been changed to panic so
that we ensure it is never called.
Benchmark differences between this and the previous commit:
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 837ms ± 3% 616ms ± 2% -26.36% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.11s ± 1% 0.91s ± 3% -17.75% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 751ms ± 6% 581ms ± 1% -22.63% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 1.13s ±11% 0.85s ± 2% -24.59% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 531MB ± 0% 148MB ± 0% -72.08% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 528MB ± 0% 145MB ± 0% -72.60% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 452MB ± 0% 145MB ± 0% -67.97% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 452MB ± 0% 141MB ± 0% -68.70% (p=0.016 n=5+4)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 8.95M ± 0% 1.60M ± 0% -82.15% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 8.84M ± 0% 1.49M ± 0% -83.16% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 5.96M ± 0% 1.57M ± 0% -73.68% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 5.86M ± 0% 1.46M ± 0% -75.05% (p=0.016 n=5+4)
Signed-off-by: beorn7 <beorn@grafana.com>
PR #16702 introduced a regression because it was too strict in
detecting the condition for using the `HistogramStatsIterator`. It
essentially required the triggering function to be buried at least one
level deep.
`histogram_count(sum(rate(native_histogram_series[2m]))` would not
trigger anymore, but
`1*histogram_count(sum(rate(native_histogram_series[2m]))` would.
Ironically, PR #16682 made the performance of the
`HistogramStatsIterator` so much worse that _not_ using it was often
better, but this has to be addressed in a separate commit.
This commit reinstates the previous `HistogramStatsIterator` detection
behavior, as PR #16702 intended to keep it.
Relevant benchmark changes with this commit (i.e. old is without using
`HistogramStatsIterator`, new is with `HistogramStatsIterator`):
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 802ms ± 3% 837ms ± 3% +4.42% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.22s ± 3% 1.11s ± 1% -9.46% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 611ms ± 5% 751ms ± 6% +22.87% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 975ms ± 4% 1131ms ±11% +16.04% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 222MB ± 0% 531MB ± 0% +139.63% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 323MB ± 0% 528MB ± 0% +63.81% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 179MB ± 0% 452MB ± 0% +153.07% (p=0.016 n=4+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 175MB ± 0% 452MB ± 0% +157.73% (p=0.016 n=4+5)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 4.48M ± 0% 8.95M ± 0% +99.51% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 5.02M ± 0% 8.84M ± 0% +75.89% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 3.00M ± 0% 5.96M ± 0% +98.93% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 2.89M ± 0% 5.86M ± 0% +102.69% (p=0.016 n=4+5)
Signed-off-by: beorn7 <beorn@grafana.com>
- Add a code comment about a counter reset edge case (which is
hopefully not relevant in practice).
- Rename the receiver from `f` to `hsi`. (`f` seemed like completely
off as a name. `i` or `it` might have worked, too, but I ended up
with `hsi` as the easiest for the reader.)
Signed-off-by: beorn7 <beorn@grafana.com>
So far, we emitted a `HistogramCounterResetCollisionWarning` when
encountering conflicting counter resets in the calculation of (i)rate
and friends. We even tested for that. However, in the rate
calculation, we are not interested in those collisions. They are
actually expected.
On the other hand, we did not warn about those collisions when doing a
`sum` aggregation, where such a warning would be appropriate.
This commit removes the warning in the former case and adds it in the
latter. Sadly, we cannot really test this as we still remove the
counter reset hint for the first sample in a chunk. (And that's the
only sample where we could get a `NotCounterReset` hint.)
Signed-off-by: beorn7 <beorn@grafana.com>
This is an attempt to make sure that we are not accidentally warning
about conflicting counter resets in rate calculation, see
https://github.com/prometheus/prometheus/pull/17051#issuecomment-3226503416 .
This is done by being more explicit about the warn expectation.
However, as long as
https://github.com/prometheus/prometheus/issues/15346 is not
addressed, we won't be able to trigger the annotation this way anyway.
However, we can play a trick, by wrapping a suitable expression in
`histogram_count` or `histogram_sum`, which will invoke the
`HistogramStatsIterator`, which in turn creates counter reset hints on
the fly. So this commit also adds tests with that, both for absence of
an annotation with `rate` and presence of an annotation with
`sum_over_time`.
Signed-off-by: beorn7 <beorn@grafana.com>
test tbs
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(nhcb): flaky test TestConvertClassicHistogramsToNHCB
The test was e2e, including actually scraping an HTTP endpoint and running
the scrape loop. This led to some timing issues.
I've simplified it to call the scrape loop append directly. I think that
this isn't nice as that is a private interface, but should gets rid of the
flakiness and there's already a bunch of test doing this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Add further tests for first_over_time (also covering existing
last_over_time, count_over_time, etc) to exercise vectors
containing a mix of float and histogram samples where the
histogram samples do not come last in the series.
This tripped over https://github.com/prometheus/prometheus/issues/17025
so it's structured a bit oddly to work around that bug in the
appender as used by promtest.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add a first_over_time function, and corresponding ts_of_first_over_time
function. Both are behind the experimental functions feature flag.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Because of relabelling, an endpoint can only select a subset of series
that go through WriteStorage
Having a highestTimestamp at WriteStorage level yields wrong values
if the corresponding sample won't even make it to a remote queue.
Currently PrometheusRemoteWriteBehind is based on that, and would fire
if an endpoint is only interested in a subset of series that take time
to appear.
A "prometheus_remote_storage_queue_highest_timestamp_seconds" that only
takes into account samples in the queue is introduced, and used in
PrometheusRemoteWriteBehind and dashboards in documentation/prometheus-mixin
Same applies to samplesIn/dataIn, QueueManager should know more about
when to update those; when data is enqueued.
That makes dataDropped unnecessary, thus help simplify the logic
in QueueManager.calculateDesiredShards()
Signed-off-by: machine424 <ayoubmrini424@gmail.com>