As per https://gitlab.com/gitlab-org/gitlab-ce/issues/46043, project
templates should be squashed before updating, so that repositories
created from these templates don't include the full history of the
backing repository.
This rake task had been broken for a while. This fixes the breakages,
adds a test to help avoid future breakages, and adds a few ergonomic
improvements to the task itself.
In some cases ActiveSession.cleanup was not called after authentication,
so for some user ActiveSession lookup keys grew without ever cleaning
up. This Rake task manually iterates over the lookup keys and removes
ones without existing ActiveSession.
This adds the rake task rake
gitlab:cleanup:orphan_job_artifact_files. This rake task cleans all
orphan job artifact files it can find on disk.
It performs a search on the complete folder of all artifacts on
disk. Then it filters out all the job artifact ID for which it could
not find a record with matching ID in the database. For these, the
file is deleted from disk.
The various LDAP check Rake tasks have long supported a SANITIZE
environment variable. When present, identifiable information is
obscured such as user names and project/group names. Until now,
the LDAP check did not honor this. Now it will only say how many
users were found. This should at least give the indication that
the LDAP configuration found something, but will not leak what
it is. Resolves#56131
We've already migrated all the legacy artifacts to the new realm,
which is ci_job_artifacts table.
It's time to remove the old code base that is no longer used.
It used to be the case that GitLab created symlinks for each repository
to one copy of the Git hooks, so these ran when required. This changed
to set the hooks dynamically on Gitaly when invoking Git.
The side effect is that we didn't need all these symlinks anymore, which
Gitaly doesn't create anymore either. Now that means that the tests in
GitLab-Rails should test for it either.
Related: https://gitlab.com/gitlab-org/gitaly/issues/1392#note_175619926
This is a small polishing on the storage migration and storage rollback
rake tasks. By aborting a migration while a rollback is already
scheduled we want to prevent unexpected consequences.
Specs were reviewed and improved to better cover the current behavior.
There was some standardization done as well to facilitate the
implementation of the rollback functionality.
StorageMigratorWorker was extracted to HashedStorage namespace were
RollbackerWorker will live one as well.
Pool repositories are persisted in the database, and when the DB is
restored, the data need to be restored on disk. This is done by
resetting the state machine and rescheduling the object pool creation.
This is not an exact replica of the state like at the time of the
creation of the backup. However, the data is consistent again.
Dumping isn't required as internally GitLab uses git bundles which
bundle all refs and include all objects in the bundle that they require,
reduplicating as more repositories get backed up. This does require more
data to be stored.
Fixes https://gitlab.com/gitlab-org/gitaly/issues/1355
Add an index to the `store` column on `uploads`. This makes counting
local uploads faster.
Also, there is no longer need to check for objects with `store = NULL`.
See https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18557
---
### Query plans
Query:
```sql
SELECT COUNT(*)
FROM "uploads"
WHERE ("uploads"."store" = ? OR "uploads"."store" IS NULL)
```
#### Without index
```
gitlabhq_production=# EXPLAIN ANALYZE SELECT uploads.* FROM uploads WHERE (uploads.store = 1 OR uploads.store IS NULL);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on uploads (cost=0.00..601729.54 rows=578 width=272) (actual time=6.170..2308.256 rows=545 loops=1)
Filter: ((store = 1) OR (store IS NULL))
Rows Removed by Filter: 4411957
Planning time: 38.652 ms
Execution time: 2308.454 ms
(5 rows)
```
#### Add index
```
gitlabhq_production=# create index uploads_tmp1 on uploads (store);
CREATE INDEX
```
#### With index
```
gitlabhq_production=# EXPLAIN ANALYZE SELECT uploads.* FROM uploads WHERE (uploads.store = 1 OR uploads.store IS NULL);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on uploads (cost=11.46..1238.88 rows=574 width=272) (actual time=0.155..0.577 rows=545 loops=1)
Recheck Cond: ((store = 1) OR (store IS NULL))
Heap Blocks: exact=217
-> BitmapOr (cost=11.46..11.46 rows=574 width=0) (actual time=0.116..0.116 rows=0 loops=1)
-> Bitmap Index Scan on uploads_tmp1 (cost=0.00..8.74 rows=574 width=0) (actual time=0.095..0.095 rows=545 loops=1)
Index Cond: (store = 1)
-> Bitmap Index Scan on uploads_tmp1 (cost=0.00..2.44 rows=1 width=0) (actual time=0.020..0.020 rows=0 loops=1)
Index Cond: (store IS NULL)
Planning time: 0.274 ms
Execution time: 0.637 ms
(10 rows)
```
Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/6070
We started syncing all the wiki regardless of the fact it's disabled or
not. We couldn't do that in one stage because of needing of smoth update
and deprecating things. This is the second stage that finally removes
unused columns in the geo_node_status table.
If doing a schema load, the post_migrations should also be marked as up,
even if SKIP_POST_DEPLOYMENT_MIGRATIONS was set, otherwise future
migration runs will be broken.
Rake tasks cleaning up the Git storage were still using direct disk
access, which won't work if these aren't attached. To mitigate a
migration issue was created.
To port gitlab:cleanup:dirs, and gitlab:cleanup:repos, a new RPC was
required, ListDirectories. This was implemented in Gitaly, through
https://gitlab.com/gitlab-org/gitaly/merge_requests/868.
To be able to use the new RPC the Gitaly server was bumped to v0.120.
This is an RPC that will not use feature gates, as this doesn't scale on
.com so there is no way to test it at scale. Futhermore, we _know_ it
doesn't scale, but this might be a useful task for smaller instances.
Lastly, the tests are slightly updated to also work when the disk isn't
attached. Eventhough this is not planned, it was very little effort and
thus I applied the boy scout rule.
Closes https://gitlab.com/gitlab-org/gitaly/issues/954
Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/40529
These tasks are happening through housekeeping right now, by default
ever 10th push. This removes the need for these tasks.
Side note, this removes one of my first contributions to GitLab, as back
than I introduced these tasks through: 54e6c0045b
Closes https://gitlab.com/gitlab-org/gitaly/issues/768
Because no Git repository was actually created at the temporary path we
were using, `git fsck` would traverse up until it found a repository,
which in our case was the CE or EE repository.
Adds a new method 'puts_time' that prepends the time of a
message when printing it. All instances of 'progress.puts'
in the gitlab:backup:restore tasks are replaced with puts_time.
Example output:
2018-06-03 16:33:25 -0400 -- Restoring uploads ..
Closes#46448
Direct disk access is done through Gitaly now, so the legacy path was
deprecated. This path was used in Gitlab::Shell however. This required
the refactoring in this commit.
Added is the removal of direct path access on the project model, as that
lookup wasn't needed anymore is most cases.
Closes https://gitlab.com/gitlab-org/gitaly/issues/1111