As discussed in
https://www.postgresql.org/message-id/9922.1353433645%40sss.pgh.pa.us,
the PostgreSQL window function last_value may not consider the
right rows:
Note that first_value, last_value, and nth_value consider only the rows
within the "window frame", which by default contains the rows from the
start of the partition through the last peer of the current row. This is
likely to give unhelpful results for last_value and sometimes also
nth_value. You can redefine the frame by adding a suitable frame
specification (RANGE or ROWS) to the OVER clause. See Section 4.2.8 for
more information about frame specifications.
This query could be fixed by adding `RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING`, but that's quite verbose. It's simpler just to
use the first_value function.
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/59985
'Populate cluster kubernetes namespace' was using factories for their
specs. According to our documentation (see spec/migrations/readme.md),
we should use table helper to create a temproary ActiveRecord::Base
derived model for a table.
On GitLab.com, we saw numerous duplicate disk entry inserts because
the migration was not taking the routes table into account. We now
implement this in the migration to be consistent.
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/56061
These are data columns that store runtime configuration
of build needed to execute it on runner and within pipeline.
The definition of this data is that once used, and when no longer
needed (due to retry capability) they can be freely removed.
They use `jsonb` on PostgreSQL, and `text` on MySQL (due to lacking
support for json datatype on old enough version).
spec/lib/gitlab/background_migration/migrate_stage_status_spec.rb:9: warning: already initialized constant STATUSES
spec/lib/gitlab/background_migration/migrate_build_stage_spec.rb:9: warning: previous definition of STATUSES was here
If the EncryptColumns background migration runs in a sidekiq with a
stale view of the database schema, or when the purported destination
columns don't actually exist, data loss can result. Attempt to work
around these issues by reloading schema information before running
the migration, and raising errors if the model reports that any of its
source or destination columns are missing.
It's possible that user pastes accidentally also unsubscribe link
which is included in footer of notification emails. This unsubscribe
link contains personal token which attacker then use to act as the
original user (e.g. for sending comments under his/her identity).
Cleanup code, and refactor tests that still use Rugged. After this, there should
be no Rugged code that access the instance's repositories on non-test
environments. There is still some rugged code for other tasks like the
repository import task, but since it doesn't access any repository storage path
it can stay.
This also adds specs for 3 distinct situations:
2. When there is an unknown pipeline with just a build
3. When there is an unknown pipeline with no statuses
1. When there is an unknown pipeline with a build and a status
[10.6] Prevent notes on confidential issues from being sent to chat
See merge request gitlab/gitlabhq!2366
# Conflicts:
# app/helpers/services_helper.rb
We still have >100K unmigrated MergeRequestDiffs
which don't have commits_count set yet (see
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17567#note_61904891)
This migration re-schedules the original background migration.
To assure that records are not processed twice, records with
commits_count set are skipped.
Related to #41698 and !17567
And use :migration tag to use deletion strategy, and to avoid caching tables, and to lock into a particular schema.
Attempting to fix intermittent spec errors `PG::UndefinedTable: ERROR: relation "public.untracked_files_for_uploads" does not exist`.
If the schema changes after 20171114162227 for any of these models, and specs
after this one use factories, then those factories will use the models with
outdated column information cached.
We shouldn't really use factories in migration specs, but this is a special case
because there is a lot of git-related setup code in the model that would be
painful to copy to the migration. Instead, we just manually reset the column
information for the models we could pollute.
For each MR diff an extra 'SELECT COUNT()' is executed
to get number of commits for the diff. Overall time to get counts for
all MR diffs may be quite expensive. To speed up loading of MR info,
information about number of commits is stored in a MR diff's extra column.
Closes#38068
If the payload cannot be created for some reason, we could be left with a nil
push event payload, which causes Error 500s when viewing the dashboard. Guard
against this error and log when it happens.
Avoids problems seen in #38823
Instead of using the factories. Since the factories might be using
columns that aren't available in the schema at version the particular
spec is running in.
Later migrations added fields to the EE DB which were used by factories which were used in these specs.
And in CE on MySQL, a single appearance row is enforced.
The migration and migration specs should not depend on the codebase staying the same.
* Hopefully fixes spec failures in which the table doesn’t exist
* Decouples the background migration from the post-deploy migration, e.g. we could easily run it again even though the table is dropped when finished.
So the path on source installs cannot be too long for our column.
And fix the column length test since Route.path is limited to 255 chars, it doesn’t matter how many nested groups there are.
That way we can join forks-of-forks into the same network even if
their original source was deleted.
Consider the following:
`awesome-oss/badger` is forked to `coolstuff/badger`, which is
forked to `user-a/badger` which in turn is forked to
`user-b/badger`.
When `awesome-oss/badger` is deleted, we will now create a fork
network with `coolstuff/badger` as the root. The `user-a/badger`
and `user-b/badger` projects will be added to the network.
The st_commits and st_diffs columns on merge_request_diffs historically held the
YAML-serialised data for a merge request diff, in a variety of formats.
Since 9.5, these have been migrated in the background to two new tables:
merge_request_diff_commits and merge_request_diff_files. That has the advantage
that we can actually query the data (for instance, to find out how many commits
we've stored), and that it can't be in a variety of formats, but must match the
new schema.
This is the final step of that journey, where we drop those columns and remove
all references to them. This is a breaking change to the importer, because we
can no longer import diffs created in the old format, and we cannot guarantee
the export will be in the new format unless it was generated after this commit.
Before the `PopulateForkNetworksRange` spec would also call the
`CreateForkNetworkMemberships` which we would count on in the spec.
With this, I'm isolating that, and counting only records created in
this particular migration instead.
In case the root project of a Fork-of-fork is deleted, the ForkNetwork
and the membership for that fork network is never created. In this
case we shouldn't try to create the membership, since the parent
membership will never be created.
This means that these fork networks will be lost.
Replaces all the explicit include metadata syntax in the specs (tag:
true) into the implicit one (:tag).
Added a cop to prevent future errors and handle autocorrection.
This version does not use transactions, but individual statements. As we have
unique constraints on the target tables for the inserts, we can just ignore
uniqueness violations there (as long as we always insert the same batch size, in
the same order).
This means the spec now must use truncation, not a transaction, as the
uniqueness violation means that the whole transaction for that spec would be
invalid, which isn't what we'd want. In real-world use, this isn't run in a
transaction anyway.
This commit also wraps unhandled exceptions, for easier finding in Sentry, and
logs with a consistent format, for easier searching.
We were hitting the statement timeout for very large MR diffs. Now we insert at
most 1,000 rows to `merge_request_diff_commits` in a single statement, or 100
rows to `merge_request_diff_files`.
This finishes the procedure for migrating events from the old format
into the new format. Code no longer uses the old setup and the database
tables used during the migration process are swapped, with the old table
being dropped.
While the database migration can be reversed this will 1) take a lot of
time as data has to be coped around 2) won't restore data in the
"events.data" column as we have no way of restoring this.
Fixes https://gitlab.com/gitlab-org/gitlab-ce/issues/37241
Guess the modes based on the following:
1. If the file didn't exist, it's zero.
2. If the diff contains 'Subproject commit', it might be a submodule, so 0600.
3. Otherwise, it's 0644.
This isn't perfect, but it doesn't have to be - it won't change file modes in
the repository.
This commit migrates events data in such a way that push events are
stored much more efficiently. This is done by creating a shadow table
called "events_for_migration", and a table called "push_event_payloads"
which is used for storing push data of push events. The background
migration in this commit will copy events from the "events" table into
the "events_for_migration" table, push events in will also have a row
created in "push_event_payloads".
This approach allows us to reclaim space in the next release by simply
swapping the "events" and "events_for_migration" tables, then dropping
the old events (now "events_for_migration") table.
The new table structure is also optimised for storage space, and does
not include the unused "title" column nor the "data" column (since this
data is moved to "push_event_payloads").
== Newly Created Events
Newly created events are inserted into both "events" and
"events_for_migration", both using the exact same primary key value. The
table "push_event_payloads" in turn has a foreign key to the _shadow_
table. This removes the need for recreating and validating the foreign
key after swapping the tables. Since the shadow table also has a foreign
key to "projects.id" we also don't have to worry about orphaned rows.
This approach however does require some additional storage as we're
duplicating a portion of the events data for at least 1 release. The
exact amount is hard to estimate, but for GitLab.com this is expected to
be between 10 and 20 GB at most. The background migration in this commit
deliberately does _not_ update the "events" table as doing so would put
a lot of pressure on PostgreSQL's auto vacuuming system.
== Supporting Both Old And New Events
Application code has also been adjusted to support push events using
both the old and new data formats. This is done by creating a PushEvent
class which extends the regular Event class. Using Rails' Single Table
Inheritance system we can ensure the right class is used for the right
data, which in this case is based on the value of `events.action`. To
support displaying old and new data at the same time the PushEvent class
re-defines a few methods of the Event class, falling back to their
original implementations for push events in the old format.
Once all existing events have been migrated the various push event
related methods can be removed from the Event model, and the calls to
`super` can be removed from the methods in the PushEvent model.
The UI and event atom feed have also been slightly changed to better
handle this new setup, fortunately only a few changes were necessary to
make this work.
== API Changes
The API only displays push data of events in the new format. Supporting
both formats in the API is a bit more difficult compared to the UI.
Since the old push data was not really well documented (apart from one
example that used an incorrect "action" nmae) I decided that supporting
both was not worth the effort, especially since events will be migrated
in a few days _and_ new events are created in the correct format.
Previously, we stored these as serialised fields - `st_{commits,diffs}` - on the
`merge_request_diffs` table. These now have their own tables -
`merge_request_diff_{commits,diffs}` - with a column for each attribute of the
serialised data.
Add a background migration to go through the existing MR diffs and migrate them
to the new format. Ignore any contents that cannot be displayed. Assuming that
we have 5 million rows to migrate, and each batch of 2,500 rows can be
completed in 5 minutes, this will take about 7 days to migrate everything.