Previously, this would issue a query for each unique `diff_refs_or_sha`
passed. This was because we didn't want to load other MR diffs into memory, as
they had some very large columns.
Now they are actually very small, and it's more efficient to just load them all
at once and do the finding in Ruby.
Previously, we kept them all in the cache. We don't need the highlight results
for older diffs - if someone does view that (which is rare), we can do the
highlighting on the fly.
On MySQL, at least, `Note#created_at` doesn't seem to store fractional seconds,
while `MergeRequest::Metrics#merged_at` does. This breaks the optimization
assumption that we only need to search for notes created *after* the MR has
been merged.
Unsynchronized system clocks also make this a dangerous assumption to make.
Adding a minute of leeway still optimizes away most notes, but allows both
cases to be handled more gracefully. If the system clocks are more than a
minute out, we'll still be broken, of course.
If we search for notes before the MR was merged, we have to load every commit
that was ever part of the MR, or mentioned in a push. In extreme cases, this can
be tens of thousands of commits to load, but we know they can't revert the merge
commit, because they are from before the MR was merged.
In the (rare) case that we don't have a `merged_at` value for the MR, we can
still search all notes.
This removes all usage of soft removals except for the "pending delete"
system implemented for projects. This in turn simplifies all the query
plans of the models that used soft removals. Since we don't really use
soft removals for anything useful there's no point in keeping it around.
This _does_ mean that hard removals of issues (which only admins can do
if I'm not mistaken) can influence the "iid" values, but that code is
broken to begin with. More on this (and how to fix it) can be found in
https://gitlab.com/gitlab-org/gitlab-ce/issues/31114.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/37447
When a project uses fast-forward merging strategy user has
to rebase MRs to target branch before it can be merged.
Now user can do rebase in UI by clicking 'Rebase' button
instead of doing rebase locally.
This feature was already present in EE, this is only backport
of the feature to CE. Couple of changes:
* removed rebase license check
* renamed migration (changed timestamp)
Closes#40301
The hook ordering influenced the diffs being generated as these used
values from before the update due to the memoization still being in
place. This commit reorders them and tests against this behaviour.
The Gitaly CommitService is being hammered by n + 1 calls, mostly when
finding commits. This leads to this gRPC being turned of on production:
https://gitlab.com/gitlab-org/gitaly/issues/514#note_48991378
Hunting down where it came from, most of them were due to
MergeRequest#show. To prove this, I set a script to request the
MergeRequest#show page 50 times. The GDK was being scraped by
Prometheus, where we have metrics on controller#action and their Gitaly
calls performed. On both occations I've restarted the full GDK so all
caches had to be rebuild.
Current master, 806a68a81f, needed 435 requests
After this commit, 154 requests
This throttles the number of UPDATE queries that can be triggered by
calling "touch" on a Note, Issue, or MergeRequest. For Note objects we
also take care of updating the associated "noteable" relation in a
smarter way than Rails does by default.
If a merge request was created with a branch name that also matched a tag name,
we'd generate a comparison to or from the tag respectively, rather than the
branch. Merging would still use the branch, of course.
To avoid this, ensure that when we get the branch heads, we prepend the
reference prefix for branches, which will ensure that we generate the correct
comparison.
The st_commits and st_diffs columns on merge_request_diffs historically held the
YAML-serialised data for a merge request diff, in a variety of formats.
Since 9.5, these have been migrated in the background to two new tables:
merge_request_diff_commits and merge_request_diff_files. That has the advantage
that we can actually query the data (for instance, to find out how many commits
we've stored), and that it can't be in a variety of formats, but must match the
new schema.
This is the final step of that journey, where we drop those columns and remove
all references to them. This is a breaking change to the importer, because we
can no longer import diffs created in the old format, and we cannot guarantee
the export will be in the new format unless it was generated after this commit.
Compared to the merge_request_diff association:
1. It's simpler to query. The query uses a foreign key to the
merge_request_diffs table, so no ordering is necessary.
2. It's faster for preloading. The merge_request_diff association has to load
every diff for the MRs in the set, then discard all but the most recent for
each. This association means that Rails can just query for N diffs from N
MRs.
3. It's more complicated to update. This is a bidirectional foreign key, so we
need to update two tables when adding a diff record. This also means we need
to handle this as a special case when importing a GitLab project.
There is some juggling with this association in the merge request model:
* `MergeRequest#latest_merge_request_diff` is _always_ the latest diff.
* `MergeRequest#merge_request_diff` reuses
`MergeRequest#latest_merge_request_diff` unless:
* Arguments are passed. These are typically to force-reload the association.
* It doesn't exist. That means we might be trying to implicitly create a
diff. This only seems to happen in specs.
* The association is already loaded. This is important for the reasons
explained in the comment, which I'll reiterate here: if we a) load a
non-latest diff, then b) get its `merge_request`, then c) get that MR's
`merge_request_diff`, we should get the diff we loaded in c), even though
that's not the latest diff.
Basically, `MergeRequest#merge_request_diff` is the latest diff in most cases,
but not quite all.
When we consider 'all' pipelines for MRs, we now mean:
1. The last 10,000 commits (unordered).
2. From the last 100 MR versions (newest first).
This seems to fix the MRs that time out on GitLab.com.
Use Commit#notes and Note.for_commit_id when possible to make sure we use all indexes available to us
Closes#34509
See merge request gitlab-org/gitlab-ce!15253
also, I refactored the MergeRequest#fetch_ref method to express
the side-effect that this method has.
MergeRequest#fetch_ref -> MergeRequest#fetch_ref!
Repository#fetch_source_branch -> Repository#fetch_source_branch!
Resolve "ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR: canceling statement due to statement timeout"
Closes#39054
See merge request gitlab-org/gitlab-ce!15063
For MRs with many thousands of commits, `SELECT DISTINCT(sha)` will be very
slow.
What we can't do to fix this:
1. Add an index. Postgres won't use it for DISTINCT without a lot of ceremony.
2. Do the `uniq` in Ruby. That can still be very slow with hundreds of
thousands of commits.
3. Use a subquery. We haven't removed the `st_commits` column yet, but we will
soon.
Until 3 is available to us, we can just do 2, but also add a limit clause. There
is no ordering, so this may return different results, but our goal with these
MRs is just to get them to load, so it's not a huge deal.
In GitLab EE, a GitLab instance can be read-only (e.g. when it's a Geo
secondary node). But in GitLab CE it also might be useful to have the
"read-only" idea around. So port it back to GitLab CE.
Also having the principle of read-only in GitLab CE would hopefully
lead to less errors introduced, doing write operations when there
aren't allowed for read-only calls.
Closesgitlab-org/gitlab-ce#37534.
MergeRequest#create_merge_request_diff and MergeRequest#reload_diff are
the only places where we generate a new MR diff so that's where we
should fetch the ref.
This also ensures that the ref is not fetched when we call
merge_request.merge_request_diffs.create in
Github::Import#fetch_pull_requests.
Signed-off-by: Rémy Coutable <remy@rymai.me>
In this particular case the use of UNION ALL leads to a better query
plan compared to using 1 big query that uses an OR statement to combine
different data sources.
See https://gitlab.com/gitlab-org/gitlab-ce/issues/38508 for more
information.
This ensures the open issues/MR count caches are refreshed properly when
creating new issues or MRs. This MR also includes a change to the cache
keys to ensure all caches are rebuilt on the fly.
This particular problem was not caught in the test suite due to a null
cache being used, resulting in all calls that would use a cache using
the underlying data directly. In production the code would fail because
a newly saved record returns an empty hash in #changes meaning checks
such as `state_changed? || confidential_changed?` would return false for
new rows, thus never updating the counters.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/38061
This ensures the issues/MR cache of the sidebar is only updated when the
state or confidential flags changes, instead of changing this for every
update.
* upstream/master: (225 commits)
Add changelog entry
Backports EE 2756 logic to CE.
Make rubocop happy
Make profile settings dropdown consistent
Add filter by my reaction
Update spec initialization with it being a shared component
Update identicon path and selector
Renamed to `identicon` and make shared component
Merge branch 'master-i18n' into 'master'
Fix broken Frontend JS guide
Replace 'project/star.feature' spinach test with an rspec analog
Adds position fixed to right sidebar
Fixes the margin of the top buttons of the pipeline page
Remove commented out code
Better align fallback image emojis
Decrease Metrics/CyclomaticComplexity threshold to 15
Add changelog
Respect the default visibility level when creating a group
Further break with_repo_branch_commit into parts
Make sure inspect doesn't generate crazy string
...
Every project page displays a navigation menu that in turn displays the
number of open issues and merge requests. This means that for every
project page we run two COUNT(*) queries, each taking up roughly 30
milliseconds on GitLab.com. By caching these numbers and refreshing them
whenever necessary we can reduce loading times of all these pages by up
to roughly 60 milliseconds.
The number of open issues does not include confidential issues. This is
a trade-off to keep the code simple and to ensure refreshing the data
only needs 2 COUNT(*) queries instead of 3. A downside is that if a
project only has 5 confidential issues the counter will be set to 0.
Because we now have 3 similar counting service classes the code
previously used in Projects::ForksCountService has mostly been moved to
Projects::CountService, which in turn is reused by the various service
classes.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/36622
Having two states that essentially mean the same thing is very much like
having a boolean "true" and boolean "mostly-true": it's rather silly.
This commit merges the "reopened" state into the "opened" state while
taking care of system notes still showing messages along the lines of
"Alice reopened this issue".
A big benefit from having only two states (opened and closed) is that
indexing and querying becomes simpler and more performant. For example,
to get all the opened queries we no longer have to query both states:
SELECT *
FROM issues
WHERE project_id = 2
AND state IN ('opened', 'reopened');
Instead we can query a single state directly, which can be much faster:
SELECT *
FROM issues
WHERE project_id = 2
AND state = 'opened';
Further, only having two states makes indexing easier as we will only
ever filter (and thus scan an index) using a single value. Partial
indexes could help but aren't supported on MySQL, complicating the
development process and not being helpful for MySQL.
For merge requests created after 9.4, we have a `merge_request_diff_commits`
table we can get all the SHAs from very quickly. We just need to exclude these
when we load from the legacy format, by ignoring diffs with no serialised
commits.
Once these have been migrated in the background, every MR will see this
improvement.
This is an ID-less table with just three columns: an association to the merge
request diff the commit belongs to, the relative order of the commit within the
merge request diff, and the commit SHA itself.
Previously we stored much more information about the commits, so that we could
display them even when they were deleted from the repo. Since 8.0, we ensure
that those commits are kept around for as long as the target repo itself is, so
we don't need to duplicate that data in the database.
This is allowed for existing instances so we don't end up 76 offenses
right away, but for new code one should _only_ use this if they _have_
to remove non database data. Even then it's usually better to do this in
a service class as this gives you more control over how to remove the
data (e.g. in bulk).
This removes the need for relying on Rails' "dependent" option for data
removal, which is _incredibly_ slow (even when using :delete_all) when
deleting large amounts of data. This also ensures data consistency is
enforced on DB level and not on application level (something Rails is
really bad at).
This commit also includes various migrations to add foreign keys to
tables that eventually point to "projects" to ensure no rows get
orphaned upon removing a project.
Fix https://gitlab.com/gitlab-org/gitlab-ce/issues/27070
Deprecate "chat commands" in favor of "slash commands"
We looked for things like:
- `slash commmand`
- `slash_command`
- `slash-command`
- `SlashCommand`
I don't know why this happens exactly, but given an upstream and fork repository
from a customer, both of which required GC, resolving conflicts would corrupt
the fork so badly that it couldn't be cloned.
This isn't a perfect fix for that case, because the MR may still need to be
merged manually, but it does ensure that the repository is at least usable.
My best guess is that when we generate the index for the conflict
resolution (which we previously did in the target project), we obtain a
reference to an OID that doesn't exist in the source, even though we already
fetch the refs from the target into the source.
Explicitly setting the source project as the place to get the merge index from
seems to prevent repository corruption in this way.
The problem is that we often go via a diff object constructed from the diffs
stored in the DB. Those diffs, by definition, don't overflow, so we don't have
access to the 'correct' `real_size` - that is stored on the MR diff object
iself.