* Follow Prometheus naming conventions[0].
* Simplify metrics by adding response lables to the histogram.
* Use standard `http_request_duration_seconds_...` names for the histogram.
[0]: https://prometheus.io/docs/practices/naming/#metric-names
`endpoint.route` is calling `env[Grape::Env::GRAPE_ROUTING_ARGS][:route_info]`
but `env[Grape::Env::GRAPE_ROUTING_ARGS]` is `nil` in the case of a 405 response
Signed-off-by: Rémy Coutable <remy@rymai.me>
GitLab Performance Monitoring is now able to track custom events not
directly related to application performance. These events include the
number of tags pushed, repositories created, builds registered, etc.
The use of these events is to get a better overview of how a GitLab
instance is used and how that may affect performance. For example, a
large number of Git pushes may have a negative impact on the underlying
storage engine.
Events are stored in the "events" measurement and are not prefixed with
"rails_" or "sidekiq_", this makes it easier to query events with the
same name triggered from different parts of the application. All events
being stored in the same measurement also makes it easier to downsample
data.
Currently the following events are tracked:
* Creating repositories
* Removing repositories
* Changing the default branch of a repository
* Pushing a new tag
* Removing an existing tag
* Pushing a commit (along with the branch being pushed to)
* Pushing a new branch
* Removing an existing branch
* Importing a repository (along with the URL we're importing)
* Forking a repository (along with the source/target path)
* CI builds registered (and when no build could be found)
* CI builds being updated
* Rails and Sidekiq exceptions
Fixesgitlab-org/gitlab-ce#13720
This reduces the overhead of the method instrumentation code primarily
by reducing the number of method calls. There are also some other small
optimisations such as not casting timing values to Floats (there's no
particular need for this), using Symbols for method call metric names,
and reducing the number of Hash lookups for instrumented methods.
The exact impact depends on the code being executed. For example, for a
method that's only called once the difference won't be very noticeable.
However, for methods that are called many times the difference can be
more significant.
For example, the loading time of a large commit
(nrclark/dummy_project@81ebdea5df)
was reduced from around 19 seconds to around 15 seconds using these
changes.
Process.clock_gettime allows getting the real time in nanoseconds as
well as allowing one to get a monotonic timestamp. This offers greater
accuracy without the overhead of having to allocate a Time instance. In
general using Time.now/Time.new is about 2x slower than using
Process.clock_gettime(). For example:
require 'benchmark/ips'
Benchmark.ips do |bench|
bench.report 'Time.now' do
Time.now.to_f
end
bench.report 'clock_gettime' do
Process.clock_gettime(Process::CLOCK_MONOTONIC, :millisecond)
end
bench.compare!
end
Running this benchmark gives:
Calculating -------------------------------------
Time.now 108.052k i/100ms
clock_gettime 125.984k i/100ms
-------------------------------------------------
Time.now 2.343M (± 7.1%) i/s - 11.670M
clock_gettime 4.979M (± 0.8%) i/s - 24.945M
Comparison:
clock_gettime: 4979393.8 i/s
Time.now: 2342986.8 i/s - 2.13x slower
Another benefit of using Process.clock_gettime() is that we can simplify
the code a bit since it can give timestamps in nanoseconds out of the
box.
Previously we'd create a separate Metric instance for every method call
that would exceed the method call threshold. This is problematic because
it doesn't provide us with information to accurately get the _total_
execution time of a particular method. For example, if the method
"Foo#bar" was called 4 times with a runtime of ~10 milliseconds we'd end
up with 4 different Metric instances. If we were to then get the
average/95th percentile/etc of the timings this would be roughly 10
milliseconds. However, the _actual_ total time spent in this method
would be around 40 milliseconds.
To solve this problem we now create a single Metric instance per method.
This Metric instance contains the _total_ real/CPU time and the call
count for every instrumented method.
We can't do a lot with classes without names as we can't filter by them,
have no idea where they come from, etc. As such it's best to just ignore
these.
By default instrumentation will instrument public,
protected and private methods, because usually
heavy work is done on private method or at least
that’s what facts is showing
Generating the following tags
Grape#GET /projects/:id/archive
from Grape::Route objects like
{ :path => /:version/projects/:id/archive(.:format)
:version => “v3”,
:method => “GET” }
Use an instance variable to cache raw_path transformations.
This variable is only going to growth to the number of
endpoints of the API, not with exact different requests
We can store this cache as an instance variable because
middleware are initialised only once
Because method call timings are inclusive (that is, they include the
time of any sub method calls) this would lead to the total method
execution time often being far greater than the total transaction time.
Because this is incredibly confusing it's best to simply _not_ track the
total method execution time, after all it's not that useful to begin
with.
Fixesgitlab-org/gitlab-ce#17239