This brings back some of the changes in
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20339.
For users using Gitaly on top of NFS, accessing the Git data directly
via Rugged is more performant than Gitaly. This merge request introduces
the feature flag `rugged_find_commit` to activate Rugged paths.
There are also Rake tasks `gitlab:features:enable_rugged` and
`gitlab:features:disable_rugged` to enable/disable these feature
flags altogether.
Part of four Rugged changes identified in
https://gitlab.com/gitlab-org/gitlab-ce/issues/57317.
This commit, introduced in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/23812,
fixes a problem creating a displaying image diff notes when the image
is stored in LFS. The main problem was that `Gitlab::Diff::File` was
returning an invalid valid in `text?` for this kind of files.
It also fixes a rendering problem with other LFS files, like text
ones. They LFS pointer shouldn't be shown when LFS is enabled
for the project, but they were.
When a project is forked, the new repository used to be a deep copy of everything
stored on disk by leveraging `git clone`. This works well, and makes isolation
between repository easy. However, the clone is at the start 100% the same as the
origin repository. And in the case of the objects in the object directory, this
is almost always going to be a lot of duplication.
Object Pools are a way to create a third repository that essentially only exists
for its 'objects' subdirectory. This third repository's object directory will be
set as alternate location for objects. This means that in the case an object is
missing in the local repository, git will look in another location. This other
location is the object pool repository.
When Git performs garbage collection, it's smart enough to check the
alternate location. When objects are duplicated, it will allow git to
throw one copy away. This copy is on the local repository, where to pool
remains as is.
These pools have an origin location, which for now will always be a
repository that itself is not a fork. When the root of a fork network is
forked by a user, the fork still clones the full repository. Async, the
pool repository will be created.
Either one of these processes can be done earlier than the other. To
handle this race condition, the Join ObjectPool operation is
idempotent. Given its idempotent, we can schedule it twice, with the
same effect.
To accommodate the holding of state two migrations have been added.
1. Added a state column to the pool_repositories column. This column is
managed by the state machine, allowing for hooks on transitions.
2. pool_repositories now has a source_project_id. This column in
convenient to have for multiple reasons: it has a unique index allowing
the database to handle race conditions when creating a new record. Also,
it's nice to know who the host is. As that's a short link to the fork
networks root.
Object pools are only available for public project, which use hashed
storage and when forking from the root of the fork network. (That is,
the project being forked from itself isn't a fork)
In this commit message I use both ObjectPool and Pool repositories,
which are alike, but different from each other. ObjectPool refers to
whatever is on the disk stored and managed by Gitaly. PoolRepository is
the record in the database.
This allows users to add patches as attachments to merge request
created via email.
When an email to create a merge request is sent, all the attachments
ending in `.patch` will be applied to the branch specified in the
subject of the email. If the branch did not exist, it will be created
from the HEAD of the repository.
When the patches could not be applied, the error message will be
replied to the user.
The patches can have a maximum combined size of 2MB for now.
Cleanup code, and refactor tests that still use Rugged. After this, there should
be no Rugged code that access the instance's repositories on non-test
environments. There is still some rugged code for other tasks like the
repository import task, but since it doesn't access any repository storage path
it can stay.
We used to apply this limitation on GitLab when using Rugged. Now that
we've shifted to Gitaly, this parameter needs to be sent via a RPC
request. Currently this value is hardcoded on Gitaly.
Without this parameter, every load of a Wiki page will load all the Wiki pages
in the repository for the sidebar. This is a significant performance penalty
that can significant slow the display of all Wiki pages.
Relates to #40101
When we save merge request diffs to the database, we need to expand the diff
before doing so. That's so that we can expand diffs (within the normal limits)
without hitting the repository, but just by going to the database.
This is done implicitly - diffs are expanded unless we say otherwise. However,
we have another option we can pass, that lets us enforce diff size limits, that
defaults to true.
Prior to this commit:
- The Rugged code path defaulted to setting `expanded: true` and
`enforce_limits: true`.
- The Gitaly code path defaulted to setting `expanded: false` and
`enforce_limits: true`.
This was introduced by eb36fa17a6, which
implemented the initial feature. Since then, if the `gitaly_diff_between`
feature flag was enabled, MRs would have diffs that could not be expanded in
some cases, with no fix other than to disable the feature flag and force push to
the MR to refresh the diff in the database.
Clients can now request the attributes from `$GIT_DIR/info/attributes`
through Gitaly. The Gitaly migration is described in gitlab-org/gitaly#1082.
The parser algorithm was implemented in a way it could handle both file
contents or a File handle, and both were already tested.
Other than that, using the boy scout rule, I've removed a class,
InfoAttributes, as it was delegating everything to the parser and
therefor wasn't really needed in my opinion.
When a repository does not exist on a remote, Gitaly won't be able to
clone it. This is correct behaviour, but from the clients perspective a
change in behaviour.
This change implements the client side changes that allows Gitaly to
execute a `git ls-remote <remote-url> HEAD`. This way the client has no
need to shell out to Git.
In the situation where multiple Gitalies are available, one is chosen at
random.
This commit closes https://gitlab.com/gitlab-org/gitlab-ce/issues/43929,
while its also a part of https://gitlab.com/gitlab-org/gitaly/issues/1084
By default, --prune is added to the command-line of a `git fetch` operation,
but for repositories with many references this can take a long time to run. We
shouldn't need to run --prune the first time we fetch a new repository.
Resolve "Gitaly::CommitService: Encoding::UndefinedConversionError: U+C124 from UTF-8 to ASCII-8BIT"
Closes#42161
See merge request gitlab-org/gitlab-ce!16637