Previously, both InfluxSampler and RubySampler were relying on the
`GC::Profiler.total_time` data which is the sum over the list
of captured GC events. Also, both samplers asynchronously called
`GC::Profiler.clear` which led to incorrect metric data because
each sampler has the wrong assumption it is the only object who calls
`GC::Profiler.clear` and thus could rely on the gathered results between
such calls.
We should ensure that `GC::Profiler.total_time` is called only in one
place making it possible to rely on accumulated data between such wipes.
Also, we need to track the amount of profiler reports we lost.
In some cases (during worker start) it's possible that
Puma.stats returns an empty hash for worker's last status. In
that case we just skip sampling of the worker until these
stats are available.
This adds ruby and unicorn instrumentation. This was originally
intended in 11.11 but due to performance concerns it was reverted. This
new commit foregoes the sys-proctable gem was causing performance issues
previously.
This updates monitor docs to reflect the new ruby and unicorn metrics as
well as making it so we fetch process start time via the proc table
instead of via CLOCK_BOOTTIME
This adds new metrics for monitoring unicorn. These metrics include
process_cpu_seconds_total, process_start_time_seconds, process_max_fds,
and unicorn_workers.
* Use a simple counter for sampler duration instead of a histogram.
* Use a counter to collect GC time.
* Remove unused objects metric.
* Cleanup metric names to match Prometheus conventions.
* Prefix generic GC stats with `gc_stat`.
* Include worker label on memory and file descriptor metrics.
In MR https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/15834 we
removed use of the data produced by the Allocations Gem. However, we
never removed the code that just enables tracking of allocations. In
this commit we remove all remaining traces of this Gem.
On GitLab.com, InfluxSampler#sample_objects appears to take 1.2 s or so to
iterate through 1059 objects. This had led to delays of a couple hundred
milliseconds in processing in the main thread. Remove this code since it's not
really being used.
Closesgitlab-com/infrastructure#3250