ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
TestScrapeLoop_HistogramBucketLimit tests the bucket limiter but it also sets sample_limit to the same value, which seems incorrect.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Currently histograms bypass sample_limit logic as the limitAppender only implements the Append() method, while histograms are appended using AppendHistogram.
This means that they are effectively ignored during sample_limit checks and a scrape with sample_limit=100 and 500 histograms will accept all samples.
Add AppendHistogram method to the limitAppender so histograms are also counted towards sample_limit.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This adds:
* A `ScrapePoolConfig()` method to the scrape manager that allows getting
the scrape config for a given pool.
* An API endpoint at `/api/v1/targets/relabel_steps` that takes a pool name
and a label set of a target and returns a detailed list of applied
relabeling rules and their output for each step.
* A "show relabeling" link/button for each target on the discovery page
that shows the detailed flow of all relabeling rules (based on the API
response) for that target.
Note that this changes the JSON encoding of the relabeling rule config
struct to output the original snake_case (instead of camelCase) field names,
and before merging, we need to be sure that's ok :) See my comment about
that at https://github.com/prometheus/prometheus/pull/15383#issuecomment-3405591487
Fixes https://github.com/prometheus/prometheus/issues/17283
Signed-off-by: Julius Volz <julius.volz@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
Applied the analyzer "modernize" to the test files.
$ go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -fix -test ./...
Signed-off-by: beorn7 <beorn@grafana.com>
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.
Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.
Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Hide adding NHCB parser on top another parser in New() function
so we can easily add direct NHCB capable parsers.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This was added in Go 1.21, and is neater than a loop deleting all
elements.
Also move the comment noting why we do this, because it could be read
as saying this is the only reason we have two maps.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of the labels hash, which could collide between two different
series, use the SeriesRef which is unique.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
`collectResultAppender` on its own will now use the labels hash
instead of a random number. This avoids the situation where a series
could be added twice under different references.
When there is an underlying appender, use the reference it generates.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* fix(nhcb): flaky test TestConvertClassicHistogramsToNHCB
The test was e2e, including actually scraping an HTTP endpoint and running
the scrape loop. This led to some timing issues.
I've simplified it to call the scrape loop append directly. I think that
this isn't nice as that is a private interface, but should gets rid of the
flakiness and there's already a bunch of test doing this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
Go will not allocate when reading from a map with a key cast from []byte to string.
Also remove some yoloString calls in package `textparse` - call a more suitable library function.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* close sync ch after sender() is stopped
* break if chan is closed
Signed-off-by: liyandi <littlepangdi@163.com>
Co-authored-by: liyandi <liyandi@xiaomi.com>
Fixes#16689
well, maybe not 100%, but should improve it.
Increase the scrape timeout to be more tolerant of slow test and
also use eventually when checking for targets.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Scrape cache is used to emit StaleNaN markers after a series disappears so it should only hold entries for series that did end up in TSDB, which is not always the case due to sample_limit.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Tests that look at samples with StaleNaN values will fail because these samples are generated from map iteration and so the order can be unstable.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
I was confused why there are no StaleNaN markers appended when a scrape hits sample_limit, but reading the code I see that's expected, so add a test for it.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Currently all staleness markers are appended for any sample that disappears from scrape cache, even if that sample was never appended to TSDB.
When staleness markers are appended they always use ref=0 as the SeriesRef, so the downstream appender doesn't know if the sample is for a know series or not.
This changes the scrape cache so the map used for staleness tracking stores the cache entry instead of only the label set. Having the cache entry means:
- we can ignore stale samples that didn't end up in TSDB (not in the scrape cache)
- we can append them to TSDB using correct ref value, so the appender knows if they are for know or unknown series
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>