This commit migrates events data in such a way that push events are
stored much more efficiently. This is done by creating a shadow table
called "events_for_migration", and a table called "push_event_payloads"
which is used for storing push data of push events. The background
migration in this commit will copy events from the "events" table into
the "events_for_migration" table, push events in will also have a row
created in "push_event_payloads".
This approach allows us to reclaim space in the next release by simply
swapping the "events" and "events_for_migration" tables, then dropping
the old events (now "events_for_migration") table.
The new table structure is also optimised for storage space, and does
not include the unused "title" column nor the "data" column (since this
data is moved to "push_event_payloads").
== Newly Created Events
Newly created events are inserted into both "events" and
"events_for_migration", both using the exact same primary key value. The
table "push_event_payloads" in turn has a foreign key to the _shadow_
table. This removes the need for recreating and validating the foreign
key after swapping the tables. Since the shadow table also has a foreign
key to "projects.id" we also don't have to worry about orphaned rows.
This approach however does require some additional storage as we're
duplicating a portion of the events data for at least 1 release. The
exact amount is hard to estimate, but for GitLab.com this is expected to
be between 10 and 20 GB at most. The background migration in this commit
deliberately does _not_ update the "events" table as doing so would put
a lot of pressure on PostgreSQL's auto vacuuming system.
== Supporting Both Old And New Events
Application code has also been adjusted to support push events using
both the old and new data formats. This is done by creating a PushEvent
class which extends the regular Event class. Using Rails' Single Table
Inheritance system we can ensure the right class is used for the right
data, which in this case is based on the value of `events.action`. To
support displaying old and new data at the same time the PushEvent class
re-defines a few methods of the Event class, falling back to their
original implementations for push events in the old format.
Once all existing events have been migrated the various push event
related methods can be removed from the Event model, and the calls to
`super` can be removed from the methods in the PushEvent model.
The UI and event atom feed have also been slightly changed to better
handle this new setup, fortunately only a few changes were necessary to
make this work.
== API Changes
The API only displays push data of events in the new format. Supporting
both formats in the API is a bit more difficult compared to the UI.
Since the old push data was not really well documented (apart from one
example that used an incorrect "action" nmae) I decided that supporting
both was not worth the effort, especially since events will be migrated
in a few days _and_ new events are created in the correct format.
When processing push payloads we now schedule at most the 100 most
recent commits, instead of all commits that were in a payload. This
prevents one from overloading the system by pushing thousands if not
millions of commits in a single go.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/25827
This adds counters for build artifacts and LFS objects, and moves
the preexisting repository_size and commit_count from the projects
table into a new project_statistics table.
The counters are displayed in the administration area for projects
and groups, and also available through the API for admins (on */all)
and normal users (on */owned)
The statistics are updated through ProjectCacheWorker, which can now
do more granular updates with the new :statistics argument.
By passing commit data to this worker we remove the need for querying
the Git repository for every job. This in turn reduces the time spent
processing each job.
The migration included migrates jobs from the old format to the new
format. For this to work properly it requires downtime as otherwise
workers may start producing errors until they're using a newer version
of the worker code.
This refactors repository caching so it's possible to selectively
refresh certain caches, instead of just expiring and refreshing
everything.
To allow this the various methods that were cached (e.g. "tag_count" and
"readme") use a similar pattern that makes expiring and refreshing
their data much easier.
In this new setup caches are refreshed as follows:
1. After a commit (but before running ProjectCacheWorker) we expire some
basic caches such as the commit count and repository size.
2. ProjectCacheWorker will recalculate the commit count, repository
size, then refresh a specific set of caches based on the list of
files changed in a push payload.
This requires a bunch of changes to the various methods that may be
cached. For one, data should not be cached if a branch used or the
entire repository does not exist. To prevent all these methods from
handling this manually this is taken care of in
Repository#cache_method_output. Some methods still manually check for
the existence of a repository but this result is also cached.
With selective flushing implemented ProjectCacheWorker no longer uses an
exclusive lease for all of its work. Instead this worker only uses a
lease to limit the number of times the repository size is updated as
this is a fairly expensive operation.
Use `Gitlab.config.gitlab.host` over `'localhost'`
Use `Gitlab.config.gitlab.host` over `'localhost'`
This would fix long standing failures running tests on
my development machine, which set `Gitlab.config.gitlab.host`
to another host because it's not my local computer. Now I
finally cannot withstand it and decided to fix them once and
for all.
See merge request !7562
This would fix long standing failures running tests on
my development machine, which set `Gitlab.config.gitlab.host`
to another host because it's not my local computer. Now I
finally cannot withstand it and decided to fix them once and
for all.
This moves the code used for processing commits from GitPushService to
its own Sidekiq worker: ProcessCommitWorker.
Using a Sidekiq worker allows us to process multiple commits in
parallel. This in turn will lead to issues being closed faster and cross
references being created faster. Furthermore by isolating this code into
a separate class it's easier to test and maintain the code.
The new worker also ensures it can efficiently check which issues can be
closed, without having to run numerous SQL queries for every issue.
These specs use raw Redis objects which can not use the memory based
caching mechanism used for tests. As such we have to explicitly flush
the data from Redis before/after each spec to ensure no data lingers on.
The amount of precision times have in databases is variable, so we need
tolerances when comparing in specs. It's better to have the tolerance defined
in one place than several.
This commit adds a number of _html columns and, with the exception of Note,
starts updating them whenever the content of their partner fields changes.
Note has a collision with the note_html attr_accessor; that will be fixed later
A background worker for clearing these cache columns is also introduced - use
`rake cache:clear` to set it off. You can clear the database or Redis caches
separately by running `rake cache:clear:db` or `rake cache:clear:redis`,
respectively.
All the code that pre-calculates metrics for use in the cycle analytics
page.
- Ci::Pipeline -> build start/finish
- Ci::Pipeline#merge_requests
- Issue -> record default metrics after save
- MergeRequest -> record default metrics after save
- Deployment -> Update "first_deployed_to_production_at" for MR metrics
- Git Push -> Update "first commit mention" for issue metrics
- Merge request create/update/refresh -> Update "merge requests closing issues"
A customer ran into an issue where a Sidekiq task retried over and over, leading to duplicate
master branches in their protected branch list.
Closes#22177
1. Change a few incorrect `access_level` to `access_levels.first` that
were missed in e805a64.
2. `API::Entities` can iterate over all access levels instead of just
the first one. This makes no difference to CE, and makes it more compatible
with EE.
!581 has a lot of changes that would cause merge conflicts if not
properly backported to CE. This commit/MR serves as a better
foundation for gitlab-org/gitlab-ee!581.
= Changes =
1. Move from `has_one {merge,push}_access_level` to `has_many`, with the
`length` of the association limited to `1`. This is _effectively_ a
`has_one` association, but should cause less conflicts with EE, which
is set to `has_many`. This has a number of related changes in the
views, specs, and factories.
2. Make `gon` variable loading more consistent (with EE!581) in the
`ProtectedBranchesController`. Also use `::` to prefix the
`ProtectedBranches` services, because this is required in EE.
3. Extract a `ProtectedBranchAccess` concern from the two access level
models. This concern only has a single `humanize` method here, but
will have more methods in EE.
4. Add `form_errors` to the protected branches creation form. This is
not strictly required for EE compatibility, but was an oversight
nonetheless.
1. It makes sense to reuse these constants since we had them duplicated
in the previous enum implementation. This also simplifies our
`check_access` implementation, because we can use
`project.team.max_member_access` directly.
2. Use `accepts_nested_attributes_for` to create push/merge access
levels. This was a bit fiddly to set up, but this simplifies our code
by quite a large amount. We can even get rid of
`ProtectedBranches::BaseService`.
3. Move API handling back into the API (previously in
`ProtectedBranches::BaseService#translate_api_params`.
4. The protected branch services now return a `ProtectedBranch` rather
than `true/false`.
5. Run `load_protected_branches` on-demand in the `create` action, to
prevent it being called unneccessarily.
6. "Masters" is pre-selected as the default option for "Allowed to Push"
and "Allowed to Merge".
7. These changes were based on a review from @rymai in !5081.
1. Caused by incorrect test setup. The user wasn't added to the project,
so protected branch creation failed authorization.
2. Change setup for a different test (`Event.last` to
`Event.find_by_action`) because our `project.team << ...` addition
was causing a conflict.