The migration steals the remaining background jobs
of populating MRs with assignees, executes them
synchronously and then makes sure that all the
assignees are migrated
On the assumption that a background migration whose specs need a schema
older than 2018 is obsoleted by this migration squash, we can remove
both specs and code for those that fail to run in CI as a result of the
schema at that date no longer existing.
This is true for all but the MigrateStageStatus background migration,
which is also used from the MigrateBuildStage background migration.
This backports all EE schema changes to CE, including EE migrations,
ensuring both use the same schema.
== Updated tests
A spec related to ghost and support bot users had to be modified to make
it pass. The spec in question assumes that the "support_bot" column
exists when defining the spec. In the single codebase setup this is not
the case, as the column is backported in a later migration. Any attempt
to use a different schema version or use of "around" blocks to
conditionally disable specs won't help, as reverting the backport
migration would also drop the "support_bot" column. Removing the
"support_bot" tests entirely appears to be the only solution.
We also need to update some foreign key tests now that we have
backported the EE columns. Fortunately, these changes are very minor.
== Backporting migrations
This commit moves EE specific migrations (except those for the Geo
tracking database) and related files to CE, and also removes any traces
of the ee/db directory.
Some migrations had to be modified or removed, as they no longer work
with the schema being backported. These migrations were all quite old,
so we opted for removing them where modifying them would take too much
time and effort.
Some old migrations were modified in EE, while also existing in CE. In
these cases we took the EE code, and in one case removed them entirely.
It's not worth spending time trying to merge these changes somehow as we
plan to remove old migrations around the release of 12.0, see
https://gitlab.com/gitlab-org/gitlab-ce/issues/59177 for more details.
We need to stub default_git_depth and default_git_depth= because some
old migrations specs try to create a record using schema before that
column was introduced.
The `let!` calls were executed before the `before` hook which still
caused some factories to fail, so they're created in the `before` hook
now as well.
We need to stub default_git_depth and default_git_depth= because some
old migrations specs try to create a record using schema before that
column was introduced.
Introduce default_git_depth in project's CI/CD settings and set it to
50. Use it if there is no GIT_DEPTH variable specified. Apply this
default only to newly created projects and keep it nil for old ones
in order to not break pipelines that rely on non-shallow clones.
default_git_depth can be updated from CI/CD Settings in the UI, must be
either nil or integer between 0 and 1000 (incl).
Inherit default_git_depth from the origin project when forking projects.
MR pipelines are run on a MR ref (refs/merge-requests/:iid/merge) and it
contains unique commit (i.e. merge commit) which doesn't exist in the
other branch/tags refs. We need to add it cause otherwise it may break
pipelines for old projects that have already enabled Pipelines for merge
results and have git depth 0.
Document new default_git_depth project CI/CD setting
Adds migrations to reset the merge_status of opened,
mergeable MRs. That's required by
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/28513
so we're able to sync the status update along merge-ref,
without leaving MRs with a stale merge-ref.
As discussed in
https://www.postgresql.org/message-id/9922.1353433645%40sss.pgh.pa.us,
the PostgreSQL window function last_value may not consider the
right rows:
Note that first_value, last_value, and nth_value consider only the rows
within the "window frame", which by default contains the rows from the
start of the partition through the last peer of the current row. This is
likely to give unhelpful results for last_value and sometimes also
nth_value. You can redefine the frame by adding a suitable frame
specification (RANGE or ROWS) to the OVER clause. See Section 4.2.8 for
more information about frame specifications.
This query could be fixed by adding `RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING`, but that's quite verbose. It's simpler just to
use the first_value function.
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/59985
'Populate cluster kubernetes namespace' was using factories for their
specs. According to our documentation (see spec/migrations/readme.md),
we should use table helper to create a temproary ActiveRecord::Base
derived model for a table.
On GitLab.com, we saw numerous duplicate disk entry inserts because
the migration was not taking the routes table into account. We now
implement this in the migration to be consistent.
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/56061
These are data columns that store runtime configuration
of build needed to execute it on runner and within pipeline.
The definition of this data is that once used, and when no longer
needed (due to retry capability) they can be freely removed.
They use `jsonb` on PostgreSQL, and `text` on MySQL (due to lacking
support for json datatype on old enough version).
spec/lib/gitlab/background_migration/migrate_stage_status_spec.rb:9: warning: already initialized constant STATUSES
spec/lib/gitlab/background_migration/migrate_build_stage_spec.rb:9: warning: previous definition of STATUSES was here
If the EncryptColumns background migration runs in a sidekiq with a
stale view of the database schema, or when the purported destination
columns don't actually exist, data loss can result. Attempt to work
around these issues by reloading schema information before running
the migration, and raising errors if the model reports that any of its
source or destination columns are missing.
It's possible that user pastes accidentally also unsubscribe link
which is included in footer of notification emails. This unsubscribe
link contains personal token which attacker then use to act as the
original user (e.g. for sending comments under his/her identity).
Cleanup code, and refactor tests that still use Rugged. After this, there should
be no Rugged code that access the instance's repositories on non-test
environments. There is still some rugged code for other tasks like the
repository import task, but since it doesn't access any repository storage path
it can stay.
This also adds specs for 3 distinct situations:
2. When there is an unknown pipeline with just a build
3. When there is an unknown pipeline with no statuses
1. When there is an unknown pipeline with a build and a status
[10.6] Prevent notes on confidential issues from being sent to chat
See merge request gitlab/gitlabhq!2366
# Conflicts:
# app/helpers/services_helper.rb
We still have >100K unmigrated MergeRequestDiffs
which don't have commits_count set yet (see
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17567#note_61904891)
This migration re-schedules the original background migration.
To assure that records are not processed twice, records with
commits_count set are skipped.
Related to #41698 and !17567
And use :migration tag to use deletion strategy, and to avoid caching tables, and to lock into a particular schema.
Attempting to fix intermittent spec errors `PG::UndefinedTable: ERROR: relation "public.untracked_files_for_uploads" does not exist`.
If the schema changes after 20171114162227 for any of these models, and specs
after this one use factories, then those factories will use the models with
outdated column information cached.
We shouldn't really use factories in migration specs, but this is a special case
because there is a lot of git-related setup code in the model that would be
painful to copy to the migration. Instead, we just manually reset the column
information for the models we could pollute.
For each MR diff an extra 'SELECT COUNT()' is executed
to get number of commits for the diff. Overall time to get counts for
all MR diffs may be quite expensive. To speed up loading of MR info,
information about number of commits is stored in a MR diff's extra column.
Closes#38068
If the payload cannot be created for some reason, we could be left with a nil
push event payload, which causes Error 500s when viewing the dashboard. Guard
against this error and log when it happens.
Avoids problems seen in #38823
Instead of using the factories. Since the factories might be using
columns that aren't available in the schema at version the particular
spec is running in.
Later migrations added fields to the EE DB which were used by factories which were used in these specs.
And in CE on MySQL, a single appearance row is enforced.
The migration and migration specs should not depend on the codebase staying the same.
* Hopefully fixes spec failures in which the table doesn’t exist
* Decouples the background migration from the post-deploy migration, e.g. we could easily run it again even though the table is dropped when finished.
So the path on source installs cannot be too long for our column.
And fix the column length test since Route.path is limited to 255 chars, it doesn’t matter how many nested groups there are.
That way we can join forks-of-forks into the same network even if
their original source was deleted.
Consider the following:
`awesome-oss/badger` is forked to `coolstuff/badger`, which is
forked to `user-a/badger` which in turn is forked to
`user-b/badger`.
When `awesome-oss/badger` is deleted, we will now create a fork
network with `coolstuff/badger` as the root. The `user-a/badger`
and `user-b/badger` projects will be added to the network.
The st_commits and st_diffs columns on merge_request_diffs historically held the
YAML-serialised data for a merge request diff, in a variety of formats.
Since 9.5, these have been migrated in the background to two new tables:
merge_request_diff_commits and merge_request_diff_files. That has the advantage
that we can actually query the data (for instance, to find out how many commits
we've stored), and that it can't be in a variety of formats, but must match the
new schema.
This is the final step of that journey, where we drop those columns and remove
all references to them. This is a breaking change to the importer, because we
can no longer import diffs created in the old format, and we cannot guarantee
the export will be in the new format unless it was generated after this commit.
Before the `PopulateForkNetworksRange` spec would also call the
`CreateForkNetworkMemberships` which we would count on in the spec.
With this, I'm isolating that, and counting only records created in
this particular migration instead.
In case the root project of a Fork-of-fork is deleted, the ForkNetwork
and the membership for that fork network is never created. In this
case we shouldn't try to create the membership, since the parent
membership will never be created.
This means that these fork networks will be lost.
Replaces all the explicit include metadata syntax in the specs (tag:
true) into the implicit one (:tag).
Added a cop to prevent future errors and handle autocorrection.
This version does not use transactions, but individual statements. As we have
unique constraints on the target tables for the inserts, we can just ignore
uniqueness violations there (as long as we always insert the same batch size, in
the same order).
This means the spec now must use truncation, not a transaction, as the
uniqueness violation means that the whole transaction for that spec would be
invalid, which isn't what we'd want. In real-world use, this isn't run in a
transaction anyway.
This commit also wraps unhandled exceptions, for easier finding in Sentry, and
logs with a consistent format, for easier searching.
We were hitting the statement timeout for very large MR diffs. Now we insert at
most 1,000 rows to `merge_request_diff_commits` in a single statement, or 100
rows to `merge_request_diff_files`.
This finishes the procedure for migrating events from the old format
into the new format. Code no longer uses the old setup and the database
tables used during the migration process are swapped, with the old table
being dropped.
While the database migration can be reversed this will 1) take a lot of
time as data has to be coped around 2) won't restore data in the
"events.data" column as we have no way of restoring this.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/37241
Guess the modes based on the following:
1. If the file didn't exist, it's zero.
2. If the diff contains 'Subproject commit', it might be a submodule, so 0600.
3. Otherwise, it's 0644.
This isn't perfect, but it doesn't have to be - it won't change file modes in
the repository.
This commit migrates events data in such a way that push events are
stored much more efficiently. This is done by creating a shadow table
called "events_for_migration", and a table called "push_event_payloads"
which is used for storing push data of push events. The background
migration in this commit will copy events from the "events" table into
the "events_for_migration" table, push events in will also have a row
created in "push_event_payloads".
This approach allows us to reclaim space in the next release by simply
swapping the "events" and "events_for_migration" tables, then dropping
the old events (now "events_for_migration") table.
The new table structure is also optimised for storage space, and does
not include the unused "title" column nor the "data" column (since this
data is moved to "push_event_payloads").
== Newly Created Events
Newly created events are inserted into both "events" and
"events_for_migration", both using the exact same primary key value. The
table "push_event_payloads" in turn has a foreign key to the _shadow_
table. This removes the need for recreating and validating the foreign
key after swapping the tables. Since the shadow table also has a foreign
key to "projects.id" we also don't have to worry about orphaned rows.
This approach however does require some additional storage as we're
duplicating a portion of the events data for at least 1 release. The
exact amount is hard to estimate, but for GitLab.com this is expected to
be between 10 and 20 GB at most. The background migration in this commit
deliberately does _not_ update the "events" table as doing so would put
a lot of pressure on PostgreSQL's auto vacuuming system.
== Supporting Both Old And New Events
Application code has also been adjusted to support push events using
both the old and new data formats. This is done by creating a PushEvent
class which extends the regular Event class. Using Rails' Single Table
Inheritance system we can ensure the right class is used for the right
data, which in this case is based on the value of `events.action`. To
support displaying old and new data at the same time the PushEvent class
re-defines a few methods of the Event class, falling back to their
original implementations for push events in the old format.
Once all existing events have been migrated the various push event
related methods can be removed from the Event model, and the calls to
`super` can be removed from the methods in the PushEvent model.
The UI and event atom feed have also been slightly changed to better
handle this new setup, fortunately only a few changes were necessary to
make this work.
== API Changes
The API only displays push data of events in the new format. Supporting
both formats in the API is a bit more difficult compared to the UI.
Since the old push data was not really well documented (apart from one
example that used an incorrect "action" nmae) I decided that supporting
both was not worth the effort, especially since events will be migrated
in a few days _and_ new events are created in the correct format.
Previously, we stored these as serialised fields - `st_{commits,diffs}` - on the
`merge_request_diffs` table. These now have their own tables -
`merge_request_diff_{commits,diffs}` - with a column for each attribute of the
serialised data.
Add a background migration to go through the existing MR diffs and migrate them
to the new format. Ignore any contents that cannot be displayed. Assuming that
we have 5 million rows to migrate, and each batch of 2,500 rows can be
completed in 5 minutes, this will take about 7 days to migrate everything.