* fix(textparse): implement NHCB parsing in ProtoBuf parser directly
The NHCB conversion does some validation, but we can only return error
from Parser.Next() not Parser.Histogram(). So the conversion needs to
happen in Next().
There are 2 cases:
1. "always_scrape_classic_histograms" is enabled, in which case we
convert after returning the classic series. This is to be consistent
with the PromParser text parser, which collects NHCB while spitting out
classic series; then returns the NHCB.
2. "always_scrape_classic_histograms" is disabled. In which case we never
return the classic series.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* refactor(textparse): skip classic series instead of adding NHCB around
Do not return the first classic series from the EntryType state,
switch to EntrySeries. This means we need to start the histogram
field state from -3 , not -2.
In EntrySeries, skip classic series if needed.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* reuse nhcb converter
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* test(textparse/nhcb): test corrupting metric family name
NHCB parse doesn't always copy the metric name from the underlying
parser. When called via HELP, UNIT, the string is directly referenced
which means that the read-ahead of NHCB can corrupt it.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
Add `ByteSize()` method to different labels implementations.
One of the use case so that we can track the memory used by Labels.
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
This requires a bit of repetition to cover all the different builds, but
it seems worth checking that the function does what is expected.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of using varint to encode the size of each label, use a single
byte for size 0-254, or a flag value of 255 followed by the size in
3 bytes little-endian.
This reduces the amount of code, and also the number of branches in
commonly-executed code, so it runs faster.
The maximum allowed label name or value length is now 2^24 or 16MB.
Memory used by labels changes as follows:
* Labels from 0 to 127 bytes length: same
* From 128 to 254: 1 byte less
* From 255 to 16383: 2 bytes more
* From 16384 to 2MB: 1 byte more
* From 2MB to 16MB: same
Labels: panic on string too long.
Slightly more user-friendly than encoding bad data and finding out when
we decode.
Clarify that Labels.Bytes() encoding can change
---------
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This removes the stringlabels build tag, makes that implementation the default one, and moves the old labels implementation under the slicelabels build tag.
Fixes#16064.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Global and Data Source configurations can specify legacy mode, but Prometheus now requires that the overall validation mode be set to UTF-8
Signed-off-by: Owen Williams <owen.williams@grafana.com>
* [ENHANCEMENT] TSDB: Improve calculation of space used by labels
The labels for each series in the Head take up some some space in the
Postings index, but far more space in the `memSeries` structure.
Instead of having the Postings index calculate this overhead, which is
a layering violation, have the caller pass in a function to do it.
Provide three implementations of this function for the three Labels
versions.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Fixes UTF-8 aggregator label list items getting mutated with quote marks when String-ified.
Fixes quoted metric names not supported in metric declarations.
Fixes UTF-8 label names not being quoted when String-ified.
Fixes https://github.com/prometheus/prometheus/issues/15470
Fixes https://github.com/prometheus/prometheus/issues/15528
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Up to 32-byte values this saves garbage, runs faster.
For prefixes, only `toLower` the part we need for the map lookup.
Split toNormalisedLower into fast and slow paths, to avoid a penalty
for the `copy` call in the case where no allocations are done.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Shortcut for `.*` matches newlines as well.
Add preamble change ^(?s:
Add test
dotAll flag por al regex
Add and fix regex tests
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
fix(utf8): ensure correct validation when legacy mode turned on
This depends on the included update of the prometheus/common dependency.
---------
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>
Since `seps` is a variable, `seps[0]` has to be bounds-checked every
time. Replacing with a constant everywhere it is used skips this
overhead.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
For example `foo.*|bar.*|baz.*`. Instead of checking each one in turn,
we build a map of prefixes, then check the smaller set that could match
the string supplied.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Improve testing and readability
Address review comments on #13843
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Symbol tables with fewer than 128 entries, so everything can be
represented as a single byte, are not realistic.
Stuff the symbol table with fake entries before adding the real ones.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Inline (by copy-paste) the fast path of `decodeVarint` in various
places where it gets called a lot.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When the label name is empty, which can happen now with quoted label
name, it should be quoted when printed as a string again.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Converted string to standarized form
* Added golang.org/x/text in Go dependencies
* Added test cases for FastRegexMatcher
* Added benchmark for toNormalizedLower
Signed-off-by: RA <ranveeravhad777@gmail.com>
This replaces the custom `moreThanOneRune` function with the standard
`utf8.DecodeRuneInString(s)` that can be used to figure out the size of
the first rune.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This adds some more test cases for unicode values, and also a benchmark
for zeroOrOneCharacterStringMatcher.Matches()
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>