Add latest changes from gitlab-org/gitlab@master

This commit is contained in:
GitLab Bot 2023-09-08 03:11:54 +00:00
parent e85e128aa0
commit 72db887953
11 changed files with 432 additions and 315 deletions

View File

@ -732,7 +732,6 @@ lib/gitlab/checks/**
/doc/ci/examples/deployment/ @phillipwells
/doc/ci/examples/semantic-release.md @phillipwells
/doc/ci/interactive_web_terminal/ @fneill
/doc/ci/large_repositories/ @fneill
/doc/ci/resource_groups/ @phillipwells
/doc/ci/runners/ @fneill
/doc/ci/services/ @fneill
@ -977,7 +976,7 @@ lib/gitlab/checks/**
/doc/user/project/repository/ @aqualls
/doc/user/project/repository/code_suggestions/ @sselhorn
/doc/user/project/repository/file_finder.md @ashrafkhamis
/doc/user/project/repository/managing_large_repositories.md @axil
/doc/user/project/repository/managing_large_repositories.md @eread
/doc/user/project/repository/web_editor.md @ashrafkhamis
/doc/user/project/requirements/ @msedlakjakubowski
/doc/user/project/service_desk/ @msedlakjakubowski

View File

@ -281,6 +281,8 @@
- 1
- - gitlab_shell
- 2
- - gitlab_subscriptions_add_on_purchases_cleanup_user_add_on_assignment
- 1
- - gitlab_subscriptions_refresh_seats
- 1
- - gitlab_subscriptions_trials_apply_trial

View File

@ -113,7 +113,7 @@ requires at least two separate environments:
- One primary site.
- One or more secondary sites that serve as replicas.
If the primary site becomes unavailable, you can fail over to one of the secondary sites.
This **advanced and complex** setup should only be undertaken if DR is
@ -204,7 +204,7 @@ However, additional workloads can multiply the impact of operations by triggerin
You may need to adjust the suggested specifications to compensate if you use, for example:
- Security software on the nodes.
- Hundreds of concurrent CI jobs for [large repositories](../../ci/large_repositories/index.md).
- Hundreds of concurrent CI jobs for [large repositories](../../user/project/repository/managing_large_repositories.md).
- Custom scripts that [run at high frequency](../logs/log_parsing.md#print-top-api-user-agents).
- [Integrations](../../integration/index.md) in many large projects.
- [Server hooks](../server_hooks.md).

View File

@ -1,254 +1,11 @@
---
stage: Verify
group: Runner
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
type: reference
redirect_to: '../../user/project/repository/managing_large_repositories.md'
remove_date: '2023-11-30'
---
# Optimize GitLab for large repositories **(FREE ALL)**
This document was moved to [another location](../../user/project/repository/managing_large_repositories.md).
Large repositories consisting of more than 50k files in a worktree
may require more optimizations beyond
[pipeline efficiency](../pipelines/pipeline_efficiency.md)
because of the time required to clone and check out.
GitLab and GitLab Runner handle this scenario well
but require optimized configuration to efficiently perform its
set of operations.
The general guidelines for handling big repositories are simple.
Each guideline is described in more detail in the sections below:
- Always fetch incrementally. Do not clone in a way that results in recreating all of the worktree.
- Always use shallow clone to reduce data transfer. Be aware that this puts more burden
on GitLab instance due to higher CPU impact.
- Control the clone directory if you heavily use a fork-based workflow.
- Optimize `git clean` flags to ensure that you remove or keep data that might affect or speed-up your build.
## Shallow cloning
> Introduced in GitLab Runner 8.9.
GitLab and GitLab Runner perform a [shallow clone](../pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone)
by default.
Ideally, you should always use `GIT_DEPTH` with a small number
like 10. This instructs GitLab Runner to perform shallow clones.
Shallow clones make Git request only the latest set of changes for a given branch,
up to desired number of commits as defined by the `GIT_DEPTH` variable.
This significantly speeds up fetching of changes from Git repositories,
especially if the repository has a very long backlog consisting of number
of big files as we effectively reduce amount of data transfer.
The following example makes the runner shallow clone to fetch only a given branch;
it does not fetch any other branches nor tags.
```yaml
variables:
GIT_DEPTH: 10
test:
script:
- ls -al
```
## Git strategy
> Introduced in GitLab Runner 8.9.
By default, GitLab is configured to use the [`fetch` Git strategy](../runners/configure_runners.md#git-strategy),
which is recommended for large repositories.
This strategy reduces the amount of data to transfer and
does not really impact the operations that you might do on a repository from CI.
## Git clone path
> Introduced in GitLab Runner 11.10.
[`GIT_CLONE_PATH`](../runners/configure_runners.md#custom-build-directories) allows you to
control where you clone your sources. This can have implications if you
heavily use big repositories with fork workflow.
Fork workflow from GitLab Runner's perspective is stored as a separate repository
with separate worktree. That means that GitLab Runner cannot optimize the usage
of worktrees and you might have to instruct GitLab Runner to use that.
In such cases, ideally you want to make the GitLab Runner executor be used only
for the given project and not shared across different projects to make this
process more efficient.
The [`GIT_CLONE_PATH`](../runners/configure_runners.md#custom-build-directories) has to be
within the `$CI_BUILDS_DIR`. Currently, it is impossible to pick any path
from disk.
## Git clean flags
> Introduced in GitLab Runner 11.10.
[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags) allows you to control
whether or not you require the `git clean` command to be executed for each CI
job. By default, GitLab ensures that you have your worktree on the given SHA,
and that your repository is clean.
[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags) is disabled when set
to `none`. On very big repositories, this might be desired because `git
clean` is disk I/O intensive. Controlling that with `GIT_CLEAN_FLAGS: -ffdx
-e .build/` (for example) allows you to control and disable removal of some
directories within the worktree between subsequent runs, which can speed-up
the incremental builds. This has the biggest effect if you re-use existing
machines and have an existing worktree that you can re-use for builds.
For exact parameters accepted by
[`GIT_CLEAN_FLAGS`](../runners/configure_runners.md#git-clean-flags), see the documentation
for [`git clean`](https://git-scm.com/docs/git-clean). The available parameters
are dependent on Git version.
## Git fetch extra flags
> [Introduced](https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4142) in GitLab Runner 13.1.
[`GIT_FETCH_EXTRA_FLAGS`](../runners/configure_runners.md#git-fetch-extra-flags) allows you
to modify `git fetch` behavior by passing extra flags.
For example, if your project contains a large number of tags that your CI jobs don't rely on,
you could add [`--no-tags`](https://git-scm.com/docs/git-fetch#Documentation/git-fetch.txt---no-tags)
to the extra flags to make your fetches faster and more compact.
Also in the case where you repository does _not_ contain a lot of
tags, `--no-tags` can [make a big difference in some cases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746).
If your CI builds do not depend on Git tags it is worth trying.
See the [`GIT_FETCH_EXTRA_FLAGS` documentation](../runners/configure_runners.md#git-fetch-extra-flags)
for more information.
## Fork-based workflow
> Introduced in GitLab Runner 11.10.
Following the guidelines above, let's imagine that we want to:
- Optimize for a big project (more than 50k files in directory).
- Use forks-based workflow for contributing.
- Reuse existing worktrees. Have preconfigured runners that are pre-cloned with repositories.
- Runner assigned only to project and all forks.
Let's consider the following two examples, one using `shell` executor and
other using `docker` executor.
### `shell` executor example
Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "shell"
builds_dir = "/builds"
cache_dir = "/cache"
[runners.custom_build_dir]
enabled = true
```
This `config.toml`:
- Uses the `shell` executor,
- Specifies a custom `/builds` directory where all clones are stored.
- Enables the ability to specify `GIT_CLONE_PATH`,
- Runs at most 4 jobs at once.
### `docker` executor example
Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "docker"
builds_dir = "/builds"
cache_dir = "/cache"
[runners.docker]
volumes = ["/builds:/builds", "/cache:/cache"]
```
This `config.toml`:
- Uses the `docker` executor,
- Specifies a custom `/builds` directory on disk where all clones are stored.
We host mount the `/builds` directory to make it reusable between subsequent runs
and be allowed to override the cloning strategy.
- Doesn't enable the ability to specify `GIT_CLONE_PATH` as it is enabled by default.
- Runs at most 4 jobs at once.
### Our `.gitlab-ci.yml`
Once we have the executor configured, we need to fine tune our `.gitlab-ci.yml`.
Our pipeline is most performant if we use the following `.gitlab-ci.yml`:
```yaml
variables:
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME
build:
script: ls -al
```
This YAML setting configures a custom clone path. This path makes it possible to re-use worktrees
between the parent project and forks because we use the same clone path for all forks.
Why use `$CI_CONCURRENT_ID`? The main reason is to ensure that worktrees used are not conflicting
between projects. The `$CI_CONCURRENT_ID` represents a unique identifier within the given executor.
When we use it to construct the path, this directory does not conflict
with other concurrent jobs running.
### Store custom clone options in `config.toml`
Ideally, all job-related configuration should be stored in `.gitlab-ci.yml`.
However, sometimes it is desirable to make these schemes part of the runner's configuration.
In the above example of Forks, making this configuration discoverable for users may be preferred,
but this brings administrative overhead as the `.gitlab-ci.yml` needs to be updated for each branch.
In such cases, it might be desirable to keep the `.gitlab-ci.yml` clone path agnostic, but make it
a configuration of the runner.
We can extend our [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
with the following specification that is used by the runner if `.gitlab-ci.yml` does not override it:
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "docker"
builds_dir = "/builds"
cache_dir = "/cache"
environment = [
"GIT_CLONE_PATH=$CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME"
]
[runners.docker]
volumes = ["/builds:/builds", "/cache:/cache"]
```
This makes the cloning configuration to be part of the given runner
and does not require us to update each `.gitlab-ci.yml`.
## Git fetch caching step
For very active repositories with a large number of references and files, consider using the
[Gitaly pack-objects cache](../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
The pack-objects cache:
- Benefits all repositories on your GitLab server.
- Automatically works for forks.
<!-- This redirect file can be deleted after <2023-11-30>. -->
<!-- Redirects that point to other docs in the same project expire in three months. -->
<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->

View File

@ -28,7 +28,7 @@ The easiest indicators to check for inefficient pipelines are the runtimes of th
stages, and the total runtime of the pipeline itself. The total pipeline duration is
heavily influenced by the:
- [Size of the repository](../large_repositories/index.md)
- [Size of the repository](../../user/project/repository/managing_large_repositories.md)
- Total number of stages and jobs.
- Dependencies between jobs.
- The ["critical path"](#directed-acyclic-graphs-dag-visualization), which represents

View File

@ -169,7 +169,7 @@ You can choose how your repository is fetched from GitLab when a job runs.
for every job. However, the local working copy is always pristine.
- `git fetch` is faster because it re-uses the local working copy (and falls
back to clone if it doesn't exist). This is recommended, especially for
[large repositories](../large_repositories/index.md#git-strategy).
[large repositories](../../user/project/repository/managing_large_repositories.md#git-strategy).
The configured Git strategy can be overridden by the [`GIT_STRATEGY` variable](../runners/configure_runners.md#git-strategy)
in the `.gitlab-ci.yml` file.
@ -192,7 +192,7 @@ a repository.
In GitLab versions 14.7 and later, newly created projects have a default `git depth`
value of `20`. GitLab versions 14.6 and earlier have a default `git depth` value of `50`.
This value can be overridden by the [`GIT_DEPTH` variable](../large_repositories/index.md#shallow-cloning)
This value can be overridden by the [`GIT_DEPTH` variable](../../user/project/repository/managing_large_repositories.md#shallow-cloning)
in the `.gitlab-ci.yml` file.
## Set a limit for how long jobs can run

View File

@ -197,7 +197,7 @@ The root causes vary, so multiple potential solutions exist, and you may need to
apply more than one:
- If this error occurs when cloning a large repository, you can
[decrease the cloning depth](../../ci/large_repositories/index.md#shallow-cloning)
[decrease the cloning depth](../../user/project/repository/managing_large_repositories.md#shallow-cloning)
to a value of `1`. For example:
```shell

View File

@ -25,12 +25,9 @@ has additional information about upgrading, including:
Depending on the installation method and your GitLab version, there are multiple
official ways to upgrade GitLab:
- [Linux packages (Omnibus)](#linux-packages-omnibus)
- [Self-compiled installations](#self-compiled-installation)
- [Docker installations](#installation-using-docker)
- [Kubernetes (Helm) installations](#installation-using-helm)
::Tabs
### Linux packages (Omnibus)
:::TabTitle Linux packages (Omnibus)
The [package upgrade guide](package/index.md)
contains the steps needed to upgrade a package installed by official GitLab
@ -39,12 +36,27 @@ repositories.
There are also instructions when you want to
[upgrade to a specific version](package/index.md#upgrade-to-a-specific-version-using-the-official-repositories).
### Self-compiled installation
:::TabTitle Helm chart (Kubernetes)
GitLab can be deployed into a Kubernetes cluster using Helm.
Instructions on how to upgrade a cloud-native deployment are in
[a separate document](https://docs.gitlab.com/charts/installation/upgrade.html).
Use the [version mapping](https://docs.gitlab.com/charts/installation/version_mappings.html)
from the chart version to GitLab version to determine the [upgrade path](#upgrade-paths).
:::TabTitle Docker
GitLab provides official Docker images for both Community and Enterprise
editions, and they are based on the Omnibus package. See how to
[install GitLab using Docker](../install/docker.md).
:::TabTitle Self-compiled (source)
- [Upgrading Community Edition and Enterprise Edition from source](upgrading_from_source.md) -
The guidelines for upgrading Community Edition and Enterprise Edition from source.
- [Patch versions](patch_versions.md) guide includes the steps needed for a
patch version, such as 13.2.0 to 13.2.1, and apply to both Community and Enterprise
patch version, such as 15.2.0 to 15.2.1, and apply to both Community and Enterprise
Editions.
In the past we used separate documents for the upgrading instructions, but we
@ -54,20 +66,7 @@ can still be found in the Git repository:
- [Old upgrading guidelines for Community Edition](https://gitlab.com/gitlab-org/gitlab-foss/tree/11-8-stable/doc/update)
- [Old upgrading guidelines for Enterprise Edition](https://gitlab.com/gitlab-org/gitlab/-/tree/11-8-stable-ee/doc/update)
### Installation using Docker
GitLab provides official Docker images for both Community and Enterprise
editions, and they are based on the Omnibus package. See how to
[install GitLab using Docker](../install/docker.md).
### Installation using Helm
GitLab can be deployed into a Kubernetes cluster using Helm.
Instructions on how to upgrade a cloud-native deployment are in
[a separate document](https://docs.gitlab.com/charts/installation/upgrade.html).
Use the [version mapping](https://docs.gitlab.com/charts/installation/version_mappings.html)
from the chart version to GitLab version to determine the [upgrade path](#upgrade-paths).
::EndTabs
## Plan your upgrade

View File

@ -103,11 +103,7 @@ For the upgrade plan, start by creating an outline of a plan that best applies
to your instance and then upgrade it for any relevant features you're using.
- Generate an upgrade plan by reading and understanding the relevant documentation:
- upgrade based on the installation method:
- [Linux package (Omnibus)](index.md#linux-packages-omnibus)
- [Self-compiled](index.md#self-compiled-installation)
- [Docker](index.md#installation-using-docker)
- [Helm Charts](index.md#installation-using-helm)
- Upgrade based on the [installation method](index.md#upgrade-based-on-installation-method).
- [Zero-downtime upgrades](zero_downtime.md) (if possible and desired)
- [Convert from GitLab Community Edition to Enterprise Edition](package/convert_to_ee.md)
- What version should you upgrade to:

View File

@ -1,53 +1,411 @@
---
stage: Systems
group: Distribution
group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
description: "Documentation on large repositories."
---
# Managing large repositories **(FREE SELF)**
# Managing monorepos
GitLab, like any Git based system, is subject to similar performance restraints when it comes to large
repositories that size into the gigabytes.
Monorepos have become a regular part of development team workflows. While they have many advantages, monorepos can present performance challenges
when using them in GitLab. Therefore, you should know:
In the following sections, we detail several best practices for improving performance with these large repositories on GitLab.
- What repository characteristics can impact performance.
- Some tools and steps to optimize monorepos.
## Large File System (LFS)
## Impact on performance
It's *strongly* recommended in any Git system that binary or blob files (for example, packages, audio, video, or graphics) are stored as Large File Storage (LFS) objects. With LFS, the objects are stored externally, such as in Object Storage, which reduces the number and size of objects in the repository. Storing objects in external Object Storage can improve performance.
Because GitLab is a Git-based system, it is subject to similar performance
constraints as Git when it comes to large repositories that are gigabytes in
size.
To analyze if a repository has large objects, you can use a tool like [`git-sizer`](https://github.com/github/git-sizer) for detailed analysis. This tool shows details about what makes up the repository, and highlights any areas of concern. If any large objects are found, you can then remove them with a tool such as [`git filter-repo`](reducing_the_repo_size_using_git.md).
Monorepos can be large for [many reasons](https://about.gitlab.com/blog/2022/09/06/speed-up-your-monorepo-workflow-in-git/#characteristics-of-monorepos).
Large repositories pose a performance risk performance when used in GitLab, especially if a large monorepo receives many clones or pushes a day, which is common for them.
Git itself has performance limitations when it comes to handling
monorepos.
[Gitaly](https://gitlab.com/gitlab-org/gitaly) is our Git storage service built
on top of [Git](https://git-scm.com/). This means that any limitations of
Git are experienced in Gitaly, and in turn by end users of GitLab.
## Profiling repositories
Large repositories generally experience performance issues in Git. Knowing why
your repository is large can help you develop mitigation strategies to avoid
performance problems.
You can use [`git-sizer`](https://github.com/github/git-sizer) to get a snapshot
of repository characteristics and discover problem aspects of your monorepo.
For example:
```shell
Processing blobs: 1652370
Processing trees: 3396199
Processing commits: 722647
Matching commits to trees: 722647
Processing annotated tags: 534
Processing references: 539
| Name | Value | Level of concern |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size | | |
| * Commits | | |
| * Count | 723 k | * |
| * Total size | 525 MiB | ** |
| * Trees | | |
| * Count | 3.40 M | ** |
| * Total size | 9.00 GiB | **** |
| * Total tree entries | 264 M | ***** |
| * Blobs | | |
| * Count | 1.65 M | * |
| * Total size | 55.8 GiB | ***** |
| * Annotated tags | | |
| * Count | 534 | |
| * References | | |
| * Count | 539 | |
| | | |
| Biggest objects | | |
| * Commits | | |
| * Maximum size [1] | 72.7 KiB | * |
| * Maximum parents [2] | 66 | ****** |
| * Trees | | |
| * Maximum entries [3] | 1.68 k | * |
| * Blobs | | |
| * Maximum size [4] | 13.5 MiB | * |
| | | |
| History structure | | |
| * Maximum history depth | 136 k | |
| * Maximum tag depth [5] | 1 | |
| | | |
| Biggest checkouts | | |
| * Number of directories [6] | 4.38 k | ** |
| * Maximum path depth [7] | 13 | * |
| * Maximum path length [8] | 134 B | * |
| * Number of files [9] | 62.3 k | * |
| * Total size of files [9] | 747 MiB | |
| * Number of symlinks [10] | 40 | |
| * Number of submodules | 0 | |
```
In this example, a few items are raised with a high level of concern. See the
following sections for information on solving:
- A high number of references.
- Large blobs.
### Large number of references
A reference in Git (a branch or tag) is used to refer to a commit. Each
reference is stored as an individual file. If you are curious, you can go
to any `.git` directory and look under the `refs` directory.
A large number of references can cause performance problems because, with more references,
object walks that Git does are larger for various operations such as clones, pushes, and
housekeeping tasks.
#### Mitigation strategies
To mitigate the effects of a large number of references in a monorepo:
- Create an automated process for cleaning up old branches.
- If certain references don't need to be visible to the client, hide them using the
[`transfer.hideRefs`](https://git-scm.com/docs/git-config#Documentation/git-config.txt-transferhideRefs)
configuration setting. Because Gitaly ignores any on-server Git configuration, you must change the Gitaly configuration
itself in `/etc/gitlab/gitlab.rb`:
```ruby
gitaly['configuration'] = {
# ...
git: {
# ...
config: [
# ...
{ key: "transfer.hideRefs", value: "refs/namespace_to_hide" },
],
},
}
```
In Git 2.42.0 and later, different Git operations can skip over hidden references
when doing an object graph walk.
### Using LFS for large blobs
Because Git is built to handle text data, it doesn't handle large
binary files efficiently.
Therefore, you should store binary or blob files (for example, packages, audio, video, or graphics)
as Large File Storage (LFS) objects. With LFS, the objects are stored externally, such as in Object
Storage, which reduces the number and size of objects in the repository. Storing
objects in external Object Storage can improve performance.
To analyze if a repository has large objects, you can use a tool like
[`git-sizer`](https://github.com/github/git-sizer) for detailed analysis. This
tool shows details about what makes up the repository, and highlights any areas
of concern. If any large objects are found, you can then remove them with a tool
such as [`git filter-repo`](reducing_the_repo_size_using_git.md).
For more information, refer to the [Git LFS documentation](../../../topics/git/lfs/index.md).
## Gitaly Pack Objects Cache
## Optimizing large repositories for GitLab
Gitaly, the service that provides storage for Git repositories, can be configured to cache a short rolling window of Git fetch responses. This is recommended for large repositories as it can notably reduce server load when your server receives lots of fetch traffic.
Other than modifying your workflow and the actual repository, you can take other
steps to maximize performance of monorepos with GitLab.
Refer to the [Gitaly Pack Objects Cache for more information](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
### Gitaly pack-objects cache
## Reference Architectures
For very active repositories with a large number of references and files, consider using the
[Gitaly pack-objects cache](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
The pack-objects cache:
Large repositories tend to be found in larger organisations with many users. The GitLab Quality and Support teams provide several [Reference Architectures](../../../administration/reference_architectures/index.md) that are the recommended way to deploy GitLab at scale.
- Benefits all repositories on your GitLab server.
- Automatically works for forks.
In these types of setups it's recommended that the GitLab environment used matches a Reference Architecture to improve performance.
You should always:
## Gitaly Cluster
- Fetch incrementally. Do not clone in a way that recreates all of the worktree.
- Use shallow clones to reduce data transfer. Be aware that this puts more burden on GitLab instance because of higher CPU impact.
Gitaly Cluster can notably improve large repository performance as it holds multiple replicas of the repository across several nodes. As a result, Gitaly Cluster can load balance read requests against those repositories and is also fault-tolerant.
Control the clone directory if you heavily use a fork-based workflow. Optimize
`git clean` flags to ensure that you remove or keep data that might affect or
speed-up your build.
It's recommended for large repositories, however, Gitaly Cluster is a large solution with additional complexity of setup, and management. Refer to the [Gitaly Cluster documentation for more information](../../../administration/gitaly/index.md), specifically the [Before deploying Gitaly Cluster](../../../administration/gitaly/index.md#before-deploying-gitaly-cluster) section.
For more information, see [Pack-objects cache](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache).
## Keep GitLab up to date
### Reduce concurrent clones in CI/CD
Performance improvements and fixes are added continuously in GitLab. As such, it's recommended you keep GitLab updated to the latest version where possible to benefit from these.
Large repositories tend to be monorepos. This usually means that these
repositories get a lot of traffic not only from users, but from CI/CD.
## Reduce concurrent clones in CI/CD
CI/CD loads tend to be concurrent because pipelines are scheduled during set times.
As a result, the Git requests against the repositories can spike notably during
these times and lead to reduced performance for both CI/CD and users alike.
Large repositories tend to be monorepos. This in turn typically means that these repositories get a lot of traffic not only from users, but from CI/CD.
You should reduce CI/CD pipeline concurrency by staggering them to run at different times. For example, a set running at one time and another set running several
minutes later.
CI/CD loads tend to be concurrent as pipelines are scheduled during set times. As a result, the Git requests against the repositories can spike notably during these times and lead to reduced performance for both CI and users alike.
#### Shallow cloning
When designing CI/CD pipelines, it's advisable to reduce their concurrency by staggering them to run at different times, for example, a set running at one time, and another set running several minutes later.
GitLab and GitLab Runner perform a [shallow clone](../../../ci/pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone)
by default.
There's several other actions that can be explored to improve CI/CD performance with large repositories. Refer to the [Runner documentation for more information](../../../ci/large_repositories/index.md).
Ideally, you should always use `GIT_DEPTH` with a small number
like 10. This instructs GitLab Runner to perform shallow clones.
Shallow clones make Git request only the latest set of changes for a given branch,
up to desired number of commits as defined by the `GIT_DEPTH` variable.
This significantly speeds up fetching of changes from Git repositories,
especially if the repository has a very long backlog consisting of a number
of big files because we effectively reduce amount of data transfer.
The following pipeline configuration example makes the runner shallow clone to fetch only a given branch.
The runner does not fetch any other branches nor tags.
```yaml
variables:
GIT_DEPTH: 10
test:
script:
- ls -al
```
#### Git strategy
By default, GitLab is configured to use the [`fetch` Git strategy](../../../ci/runners/configure_runners.md#git-strategy),
which is recommended for large repositories.
This strategy reduces the amount of data to transfer and
does not really impact the operations that you might do on a repository from CI/CD.
#### Git clone path
[`GIT_CLONE_PATH`](../../../ci/runners/configure_runners.md#custom-build-directories) allows you to
control where you clone your repositories. This can have implications if you
heavily use big repositories with a fork-based workflow.
A fork, from the perspective of GitLab Runner, is stored as a separate repository
with a separate worktree. That means that GitLab Runner cannot optimize the usage
of worktrees and you might have to instruct GitLab Runner to use that.
In such cases, ideally you want to make the GitLab Runner executor be used only
for the given project and not shared across different projects to make this
process more efficient.
The [`GIT_CLONE_PATH`](../../../ci/runners/configure_runners.md#custom-build-directories) must be
in the directory set in `$CI_BUILDS_DIR`. You can't pick any path from disk.
#### Git clean flags
[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags) allows you to control
whether or not you require the `git clean` command to be executed for each CI/CD
job. By default, GitLab ensures that:
- You have your worktree on the given SHA.
- Your repository is clean.
[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags) is disabled when set
to `none`. On very big repositories, this might be desired because `git
clean` is disk I/O intensive. Controlling that with `GIT_CLEAN_FLAGS: -ffdx
-e .build/` (for example) allows you to control and disable removal of some
directories in the worktree between subsequent runs, which can speed-up
the incremental builds. This has the biggest effect if you re-use existing
machines and have an existing worktree that you can re-use for builds.
For exact parameters accepted by
[`GIT_CLEAN_FLAGS`](../../../ci/runners/configure_runners.md#git-clean-flags), see the documentation
for [`git clean`](https://git-scm.com/docs/git-clean). The available parameters
are dependent on the Git version.
#### Git fetch extra flags
[`GIT_FETCH_EXTRA_FLAGS`](../../../ci/runners/configure_runners.md#git-fetch-extra-flags) allows you
to modify `git fetch` behavior by passing extra flags.
For example, if your project contains a large number of tags that your CI/CD jobs don't rely on,
you could add [`--no-tags`](https://git-scm.com/docs/git-fetch#Documentation/git-fetch.txt---no-tags)
to the extra flags to make your fetches faster and more compact.
Also in the case where you repository does _not_ contain a lot of
tags, `--no-tags` can [make a big difference in some cases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746).
If your CI/CD builds do not depend on Git tags, setting `--no-tags` is worth trying.
For more information, see the [`GIT_FETCH_EXTRA_FLAGS` documentation](../../../ci/runners/configure_runners.md#git-fetch-extra-flags).
#### Fork-based workflow
Following the guidelines above, let's imagine that we want to:
- Optimize for a big project (more than 50k files in directory).
- Use forks-based workflow for contributing.
- Reuse existing worktrees. Have preconfigured runners that are pre-cloned with repositories.
- Runner assigned only to project and all forks.
Let's consider the following two examples, one using `shell` executor and
other using `docker` executor.
##### `shell` executor example
Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "shell"
builds_dir = "/builds"
cache_dir = "/cache"
[runners.custom_build_dir]
enabled = true
```
This `config.toml`:
- Uses the `shell` executor,
- Specifies a custom `/builds` directory where all clones are stored.
- Enables the ability to specify `GIT_CLONE_PATH`,
- Runs at most 4 jobs at once.
##### `docker` executor example
Let's assume that you have the following [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html).
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "docker"
builds_dir = "/builds"
cache_dir = "/cache"
[runners.docker]
volumes = ["/builds:/builds", "/cache:/cache"]
```
This `config.toml`:
- Uses the `docker` executor,
- Specifies a custom `/builds` directory on disk where all clones are stored.
We host mount the `/builds` directory to make it reusable between subsequent runs
and be allowed to override the cloning strategy.
- Doesn't enable the ability to specify `GIT_CLONE_PATH` as it is enabled by default.
- Runs at most 4 jobs at once.
##### Our `.gitlab-ci.yml`
Once we have the executor configured, we need to fine tune our `.gitlab-ci.yml`.
Our pipeline is most performant if we use the following `.gitlab-ci.yml`:
```yaml
variables:
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME
build:
script: ls -al
```
This YAML setting configures a custom clone path. This path makes it possible to re-use worktrees
between the parent project and forks because we use the same clone path for all forks.
Why use `$CI_CONCURRENT_ID`? The main reason is to ensure that worktrees used are not conflicting
between projects. The `$CI_CONCURRENT_ID` represents a unique identifier within the given executor.
When we use it to construct the path, this directory does not conflict
with other concurrent jobs running.
### Store custom clone options in `config.toml`
Ideally, all job-related configuration should be stored in `.gitlab-ci.yml`.
However, sometimes it is desirable to make these schemes part of the runner's configuration.
In the above example of forks, making this configuration discoverable for users may be preferred,
but this brings administrative overhead as the `.gitlab-ci.yml` needs to be updated for each branch.
In such cases, it might be desirable to keep the `.gitlab-ci.yml` clone path agnostic, but make it
a configuration of the runner.
We can extend our [`config.toml`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
with the following specification that is used by the runner if `.gitlab-ci.yml` does not override it:
```toml
concurrent = 4
[[runners]]
url = "GITLAB_URL"
token = "TOKEN"
executor = "docker"
builds_dir = "/builds"
cache_dir = "/cache"
environment = [
"GIT_CLONE_PATH=$CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME"
]
[runners.docker]
volumes = ["/builds:/builds", "/cache:/cache"]
```
This makes the cloning configuration to be part of the given runner
and does not require us to update each `.gitlab-ci.yml`.
### Reference architectures
Large repositories tend to be found in larger organisations with many users. The GitLab Quality and Support teams provide several [reference architectures](../../../administration/reference_architectures/index.md) that are the recommended way to deploy GitLab at scale.
In these types of setups, the GitLab environment used should match a reference architecture to improve performance.
### Gitaly Cluster
Gitaly Cluster can notably improve large repository performance because it holds multiple replicas of the repository across several nodes.
As a result, Gitaly Cluster can load balance read requests against those replicas and is fault-tolerant.
Though Gitaly Cluster is recommended for large repositories, it is a large solution with additional complexity of setup and management. Refer to the
[Gitaly Cluster documentation for more information](../../../administration/gitaly/index.md), specifically the
[Before deploying Gitaly Cluster](../../../administration/gitaly/index.md#before-deploying-gitaly-cluster) section.
### Keep GitLab up to date
You should keep GitLab updated to the latest version where possible to benefit from performance improvements and fixes are added continuously to GitLab.

View File

@ -42515,9 +42515,6 @@ msgstr ""
msgid "SecurityReports|Add a comment or reason for dismissal"
msgstr ""
msgid "SecurityReports|Add comment & dismiss"
msgstr ""
msgid "SecurityReports|Add or remove projects to monitor in the security area. Projects included in this list will have their results displayed in the security dashboard and vulnerability report."
msgstr ""
@ -42572,6 +42569,9 @@ msgstr ""
msgid "SecurityReports|Configure security testing"
msgstr ""
msgid "SecurityReports|Confirm dismissal"
msgstr ""
msgid "SecurityReports|Create Issue"
msgstr ""
@ -42587,9 +42587,15 @@ msgstr ""
msgid "SecurityReports|Development vulnerabilities"
msgstr ""
msgid "SecurityReports|Dismiss as"
msgstr ""
msgid "SecurityReports|Dismiss vulnerability"
msgstr ""
msgid "SecurityReports|Dismissal comment"
msgstr ""
msgid "SecurityReports|Dismissed '%{vulnerabilityName}'"
msgstr ""
@ -42620,6 +42626,9 @@ msgstr ""
msgid "SecurityReports|Download the patch to apply it manually"
msgstr ""
msgid "SecurityReports|Edit dismissal"
msgstr ""
msgid "SecurityReports|Either you don't have permission to view this dashboard or the dashboard has not been setup. Please check your permission settings with your administrator or check your dashboard configurations to proceed."
msgstr ""
@ -42737,9 +42746,6 @@ msgstr ""
msgid "SecurityReports|Results show vulnerabilities introduced by the merge request, in addition to existing vulnerabilities from the latest successful pipeline in your project's default branch."
msgstr ""
msgid "SecurityReports|Save comment"
msgstr ""
msgid "SecurityReports|Scan details"
msgstr ""