https://gitlab.com/gitlab-org/gitlab-ce/issues/62971
Adds support to EnvironmentsController#metrics_dashboard
for the following params: group, title, y_label
These params are used to uniquely identify a panel on
the metrics dashboard.
Metrics are stored in several places, so this adds
utilities to find a specific panel from the database
or filesystem depending on the metric specified.
Also moves some shared utilities into separate classes,
notably default values and errors.
Previously, both InfluxSampler and RubySampler were relying on the
`GC::Profiler.total_time` data which is the sum over the list
of captured GC events. Also, both samplers asynchronously called
`GC::Profiler.clear` which led to incorrect metric data because
each sampler has the wrong assumption it is the only object who calls
`GC::Profiler.clear` and thus could rely on the gathered results between
such calls.
We should ensure that `GC::Profiler.total_time` is called only in one
place making it possible to rely on accumulated data between such wipes.
Also, we need to track the amount of profiler reports we lost.
In preparation for embedding specific metrics in issues
https://gitlab.com/gitlab-org/gitlab-ce/issues/62971,
this commit moves the BaseService for metrics dashboards
to a new services subdirectory. This is purely for the sake
of organization and maintainability.
* Remove `controller` and `action` labels from duration histogram.
* Create a new simple counter for `controller` and `action`.
* Adjust histogram buckets to observe smaller response times.
Adds GFM Pipline filters to insert a placeholder in the generated
HTML from GFM based on the presence of a metrics dashboard link.
The front end should look for the class 'js-render-metrics' to
determine if it should replace the element with metrics charts.
The data element 'data-dashboard-url' should be the endpoint
the front end should hit in order to obtain a dashboard layout
in order to appropriately render the charts.
Adds permission checks to the metrics_dashboard endpoint. Users
with role of Reporter or above should have access to view the
metrics for a given project.
This commits adds support for metrics dashboards
for embedding. If the flag 'embedded' is provided
to the environments/id/metrics_dashboard endpoint,
the response will be suitable for embedding in
issues or other content.
This is a precursor for support for embedding
metrics in GFM.
Error classes associated with individual stages of
dashboard processing tend to have very long names.
As dashboard post-processing includes more steps,
we will likely need to handle more error cases.
Refactoring to have all errors inherit from a specific
base class will help accommodate this and keep the code
more readable.
Adds a new stage to dashboard processesing step for the
EnvironmentsController::metrics_dashboard endpoint.
Allows the front end to avoid generating the endpoint
unitutive string mutations.
In some cases (during worker start) it's possible that
Puma.stats returns an empty hash for worker's last status. In
that case we just skip sampling of the worker until these
stats are available.
This adds ruby and unicorn instrumentation. This was originally
intended in 11.11 but due to performance concerns it was reverted. This
new commit foregoes the sys-proctable gem was causing performance issues
previously.
Updates the EnvironmentController#metrics_dashboard endpoint
to support a "dashboard" param, which can be used to specify
the filepath of a dashboard configuration from a project
repository. Dashboard configurations are expected to be
stored in .gitlab/dashboards/.
Updates dashboard post-processing steps to exclude custom
metrics, which should only display on the system dashboard.
This updates monitor docs to reflect the new ruby and unicorn metrics as
well as making it so we fetch process start time via the proc table
instead of via CLOCK_BOOTTIME
This adds new metrics for monitoring unicorn. These metrics include
process_cpu_seconds_total, process_start_time_seconds, process_max_fds,
and unicorn_workers.
* Use a simple counter for sampler duration instead of a histogram.
* Use a counter to collect GC time.
* Remove unused objects metric.
* Cleanup metric names to match Prometheus conventions.
* Prefix generic GC stats with `gc_stat`.
* Include worker label on memory and file descriptor metrics.
In MR https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/15834 we
removed use of the data produced by the Allocations Gem. However, we
never removed the code that just enables tracking of allocations. In
this commit we remove all remaining traces of this Gem.
There seem to be a lot of cases where the suffix of an action (e.g.
".html") is set to bogus data, such as "*/*" or entire URLs. This can
increase cardinality of our metrics, and isn't very useful for
monitoring and filtering. To work around this, we enforce a whitelist
containing a few suffixes we actually care about. Suffixes not supported
will be grouped under the action without a suffix. This means that a
request to "FooController#bar.jpeg" will be assigned to
"FooController#bar".
Certain controllers (e.g. Doorkeeper::TokensController) don't expose the
method "request_format". This commit changes
Gitlab::Metrics::WebTransaction so we don't rely on this method, instead
using the underlying code this method uses.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/46412
The method "content_type" on a controller does not always return the
correct content type. On the other hand, the method "request_format"
does _and_ immediately returns a Symbol (e.g. :json) instead of a
mime-type name (e.g. application/json). With these changes metrics
should again report their action names correctly.
Fixes https://gitlab.com/gitlab-com/infrastructure/issues/3499
The Sidekiq exporter logs were mixing with the normal Sidekiq logs. In order
to support structured logging in Sidekiq, we either need to split this data
out or convert the exporter to produce structured logs. Since Sidekiq job
processing is fundamentally different information from Web server traffic,
it seems cleaner to move the metrics traffic into a separate file, where they
can be parsed by a different filter if needed.
Relates to #20060
Reduce cardinality of some of GitLab's Prometheus metrics and fix observed duration reporting.
Closes#41045
See merge request gitlab-org/gitlab-ce!15881
i.e.
Using compare and swap we update the expires_at value.
The thread that actually is able to update this value will also set
the cache holding method_call enabled state
On GitLab.com, InfluxSampler#sample_objects appears to take 1.2 s or so to
iterate through 1059 objects. This had led to delays of a couple hundred
milliseconds in processing in the main thread. Remove this code since it's not
really being used.
Closesgitlab-com/infrastructure#3250