Add latest changes from gitlab-org/gitlab@master
This commit is contained in:
parent
f99954f2d2
commit
9c10dfefc2
|
|
@ -150,11 +150,11 @@ export default {
|
|||
),
|
||||
namePlaceholder: s__('MlModelRegistry|For example my-model'),
|
||||
versionDescription: s__('MlModelRegistry|Leave empty to skip version creation.'),
|
||||
versionPlaceholder: s__('MlModelRegistry|For example 1.0.0'),
|
||||
descriptionPlaceholder: s__('MlModelRegistry|Enter some model description'),
|
||||
versionDescriptionTitle: s__('MlModelRegistry|Version Description'),
|
||||
versionPlaceholder: s__('MlModelRegistry|For example 1.0.0. Must be a semantic version.'),
|
||||
descriptionPlaceholder: s__('MlModelRegistry|Enter a model description'),
|
||||
versionDescriptionTitle: s__('MlModelRegistry|Version description'),
|
||||
versionDescriptionPlaceholder: s__(
|
||||
'MlModelRegistry|Initial version name. Must be a semantic version.',
|
||||
'MlModelRegistry|Enter a description for this version of the model.',
|
||||
),
|
||||
buttonTitle: s__('MlModelRegistry|Create model'),
|
||||
title: s__('MlModelRegistry|Create model, version & import artifacts'),
|
||||
|
|
|
|||
|
|
@ -4,15 +4,33 @@
|
|||
breaking_change: true
|
||||
reporter: sam.white
|
||||
body: | # Do not modify this line, instead modify the lines below.
|
||||
All functionality related to GitLab's Container Network Security and Container Host Security categories is deprecated in GitLab 14.8 and scheduled for removal in GitLab 15.0. Users who need a replacement for this functionality are encouraged to evaluate the following open source projects as potential solutions that can be installed and managed outside of GitLab: [AppArmor](https://gitlab.com/apparmor/apparmor), [Cilium](https://github.com/cilium/cilium), [Falco](https://github.com/falcosecurity/falco), [FluentD](https://github.com/fluent/fluentd), [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/). To integrate these technologies into GitLab, add the desired Helm charts into your copy of the [Cluster Management Project Template](https://docs.gitlab.com/ee/user/clusters/management_project_template.html). Deploy these Helm charts in production by calling commands through GitLab [CI/CD](https://docs.gitlab.com/ee/user/clusters/agent/ci_cd_workflow.html).
|
||||
All functionality related to GitLab's Container Network Security and
|
||||
Container Host Security categories is deprecated in GitLab 14.8 and
|
||||
scheduled for removal in GitLab 15.0. Users who need a replacement for this
|
||||
functionality are encouraged to evaluate the following open source projects
|
||||
as potential solutions that can be installed and managed outside of GitLab:
|
||||
[AppArmor](https://gitlab.com/apparmor/apparmor),
|
||||
[Cilium](https://github.com/cilium/cilium),
|
||||
[Falco](https://github.com/falcosecurity/falco),
|
||||
[FluentD](https://github.com/fluent/fluentd),
|
||||
[Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/).
|
||||
|
||||
As part of this change, the following specific capabilities within GitLab are now deprecated, and are scheduled for removal in GitLab 15.0:
|
||||
To integrate these technologies into GitLab, add the desired Helm charts
|
||||
into your copy of the
|
||||
[Cluster Management Project Template](https://docs.gitlab.com/ee/user/clusters/management_project_template.html).
|
||||
Deploy these Helm charts in production by calling commands through GitLab
|
||||
[CI/CD](https://docs.gitlab.com/ee/user/clusters/agent/ci_cd_workflow.html).
|
||||
|
||||
As part of this change, the following specific capabilities within GitLab
|
||||
are now deprecated, and are scheduled for removal in GitLab 15.0:
|
||||
|
||||
- The **Security & Compliance > Threat Monitoring** page.
|
||||
- The `Network Policy` security policy type, as found on the **Security & Compliance > Policies** page.
|
||||
- The ability to manage integrations with the following technologies through GitLab: AppArmor, Cilium, Falco, FluentD, and Pod Security Policies.
|
||||
- All APIs related to the above functionality.
|
||||
|
||||
For additional context, or to provide feedback regarding this change, please reference our open [deprecation issue](https://gitlab.com/groups/gitlab-org/-/epics/7476).
|
||||
For additional context, or to provide feedback regarding this change,
|
||||
please reference our open
|
||||
[deprecation issue](https://gitlab.com/groups/gitlab-org/-/epics/7476).
|
||||
# The following items are not published on the docs page, but may be used in the future.
|
||||
stage: "Protect"
|
||||
|
|
|
|||
|
|
@ -16,8 +16,20 @@ DETAILS:
|
|||
|
||||
Silent Mode allows you to silence outbound communication, such as emails, from GitLab. Silent Mode is not intended to be used on environments which are in-use. Two use-cases are:
|
||||
|
||||
- Validating Geo site promotion. You have a secondary Geo site as part of your [disaster recovery](../geo/disaster_recovery/index.md) solution. You want to regularly test promoting it to become a primary Geo site, as a best practice to ensure your disaster recovery plan actually works. But you don't want to actually perform an entire failover, since the primary site lives in a region which provides the lowest latency to your users. And you don't want to take downtime during every regular test. So, you let the primary site remain up, while you promote the secondary site. You start smoke testing the promoted site. But, the promoted site starts emailing users, the push mirrors push changes to external Git repositories, etc. This is where Silent Mode comes in. You can enable it as part of site promotion, to avoid this issue.
|
||||
- Validating GitLab backups. You set up a testing instance to test that your backups restore successfully. As part of the restore, you enable Silent Mode, for example to avoid sending invalid emails to users.
|
||||
- Validating Geo site promotion. You have a secondary Geo site as part of your
|
||||
[disaster recovery](../geo/disaster_recovery/index.md) solution. You want to
|
||||
regularly test promoting it to become a primary Geo site, as a best practice
|
||||
to ensure your disaster recovery plan actually works. But you don't want to
|
||||
actually perform an entire failover, since the primary site lives in a region
|
||||
which provides the lowest latency to your users. And you don't want to take
|
||||
downtime during every regular test. So, you let the primary site remain up,
|
||||
while you promote the secondary site. You start smoke testing the promoted
|
||||
site. But, the promoted site starts emailing users, the push mirrors push
|
||||
changes to external Git repositories, etc. This is where Silent Mode comes in.
|
||||
You can enable it as part of site promotion, to avoid this issue.
|
||||
- Validating GitLab backups. You set up a testing instance to test that your
|
||||
backups restore successfully. As part of the restore, you enable Silent Mode,
|
||||
for example to avoid sending invalid emails to users.
|
||||
|
||||
## Enable Silent Mode
|
||||
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ DETAILS:
|
|||
**Tier:** Premium, Ultimate
|
||||
**Offering:** GitLab.com, Self-managed, GitLab Dedicated
|
||||
|
||||
Every API call for [merge train](../ci/pipelines/merge_trains.md) must be authenticated with at lease the Developer [role](../user/permissions.md).
|
||||
Every API call for [merge train](../ci/pipelines/merge_trains.md) must be authenticated with at least the Developer [role](../user/permissions.md).
|
||||
|
||||
If a user is not a member of a project and the project is private, a `GET` request on that project returns a `404` status code.
|
||||
|
||||
|
|
@ -21,7 +21,7 @@ If Merge Trains is not available for the project, a `403` status code is returne
|
|||
By default, `GET` requests return 20 results at a time because the API results
|
||||
are paginated.
|
||||
|
||||
Read more on [pagination](rest/index.md#pagination).
|
||||
For more information, see [pagination](rest/index.md#pagination).
|
||||
|
||||
## List Merge Trains for a project
|
||||
|
||||
|
|
|
|||
|
|
@ -63,7 +63,23 @@ Let's use an example. Let's say we want to change feature flag `lorem_ipsum_dola
|
|||
|
||||
> `/chatops run feature set lorem_ipsum_dolar ayufan`
|
||||
|
||||
The command only change the flag for that actor on our Primary Cell. All other cells will be ignored. We may be able to expand Chatops to be able to accept an added flag such that we could directly set the actor on a particular Cell. Doing so will require the Engineer to know which Cell an actor resides. The reason for these limitations is to account for the fact that users and projects may be spread across a multitude of Cells. Cells are also being designed such that we can migrate data from one Cell to another. Feature Flag data is stored as a setting on a Cell and thus the metadata associated with what flags are set are not part of the knowledge associated with an actor. This introduces risk that if we target an actor, and later that actor is moved, the flag would no longer be set properly. This will lead to differing behavior for a given actor, but normally these types of changes happen to internal customers to GitLab reducing risk that users will notice a behavioral change as they switch between Cells. This implementation is also simplistic, removing the need to query _some service_ which hosts the Cells and actor resides, and needing to develop a specialized rollout procedure when the resulting target may be more than a single Cell. This is discussed a bit more in the next section.
|
||||
The command only change the flag for that actor on our Primary Cell. All other
|
||||
cells will be ignored. We may be able to expand Chatops to be able to accept an
|
||||
added flag such that we could directly set the actor on a particular Cell. Doing
|
||||
so will require the Engineer to know which Cell an actor resides. The reason for
|
||||
these limitations is to account for the fact that users and projects may be spread
|
||||
across a multitude of Cells. Cells are also being designed such that we can migrate
|
||||
data from one Cell to another. Feature Flag data is stored as a setting on a Cell
|
||||
and thus the metadata associated with what flags are set are not part of the
|
||||
knowledge associated with an actor. This introduces risk that if we target an
|
||||
actor, and later that actor is moved, the flag would no longer be set properly.
|
||||
This will lead to differing behavior for a given actor, but normally these types
|
||||
of changes happen to internal customers to GitLab reducing risk that users will
|
||||
notice a behavioral change as they switch between Cells. This implementation is
|
||||
also simplistic, removing the need to query _some service_ which hosts the Cells
|
||||
and actor resides, and needing to develop a specialized rollout procedure when
|
||||
the resulting target may be more than a single Cell. This is discussed a bit more
|
||||
in the next section.
|
||||
|
||||
##### Engagement on Environments
|
||||
|
||||
|
|
@ -73,7 +89,20 @@ Let's use an example. Let's say we want to enable the flag `lorem_ipsum_dolar` o
|
|||
|
||||
> `/chatops run feature set lorem_ipsum_dolar true --production`
|
||||
|
||||
This command will need to perform a lot of work. Firstly, it needs to gather all Cells where this feature flag exists. If the flag does not exist on any Cell, we must not change this as this introduces an consistency issue between the Engineer expectations and that of the Production environment. We may consider an override in the case we are attempting to leverage this to mitigate an incident of a deployment that is not fully completed, however. If the flag does exist across all Cells of a given environment, we then begin to roll that change out across all Cells. It would be inadvisable to change all Cells at the same time. Chatops now needs the ability to have some mechanism to make the change to a given list of Cells, wait for some signal, then proceeding to the next list of Cells. Repeating until completion. We may need a mechanism to bypass this intentionally built slow rollout if we are targeting a flag that may remediate an incident across all Cells. The Delivery team plan on using a Ring style of deployments for Cells, we may be able to leverage similar metadata to assist in rollouts for this use case.
|
||||
This command will need to perform a lot of work. Firstly, it needs to gather all
|
||||
Cells where this feature flag exists. If the flag does not exist on any Cell, we
|
||||
must not change this as this introduces an consistency issue between the Engineer
|
||||
expectations and that of the Production environment. We may consider an override
|
||||
in the case we are attempting to leverage this to mitigate an incident of a
|
||||
deployment that is not fully completed, however. If the flag does exist across
|
||||
all Cells of a given environment, we then begin to roll that change out across
|
||||
all Cells. It would be inadvisable to change all Cells at the same time. Chatops
|
||||
now needs the ability to have some mechanism to make the change to a given list
|
||||
of Cells, wait for some signal, then proceeding to the next list of Cells. Repeating
|
||||
until completion. We may need a mechanism to bypass this intentionally built slow
|
||||
rollout if we are targeting a flag that may remediate an incident across all Cells.
|
||||
The Delivery team plan on using a Ring style of deployments for Cells, we may be
|
||||
able to leverage similar metadata to assist in rollouts for this use case.
|
||||
|
||||
#### Requirements
|
||||
|
||||
|
|
|
|||
|
|
@ -491,7 +491,19 @@ Feature Flags are discussed in [data-stores#83](https://gitlab.com/gitlab-org/en
|
|||
|
||||
#### Package Rollout Policy
|
||||
|
||||
We have an implicit procedure driven by our current use of auto-deploys. This will become more prominent with Cells. As implied in various formats above, auto-deploy shall operate relatively similarly to how it operates today. Cells becomes an addition to the existing `release-tools` pipeline with triggers in differing areas. When and what we trigger will need to be keenly defined. It is expected that Secondary Cells only receive `graduated` versions of GitLab. Thus, we'll leverage the use of our Post Deployment Migration pipeline as the gatekeeper for when a package is considered `graduated`. In an ideal world, when the PDM is executed successfully on the Primary Cell, that package is then considered `graduated` and can be deployed to any outer ring. This same concept is already leveraged when we build releases for self managed customers. This break point is already natural to Release Managers and thus is a good carry over for Cell deployments.
|
||||
We have an implicit procedure driven by our current use of auto-deploys. This
|
||||
will become more prominent with Cells. As implied in various formats above,
|
||||
auto-deploy shall operate relatively similarly to how it operates today. Cells
|
||||
becomes an addition to the existing `release-tools` pipeline with triggers in
|
||||
differing areas. When and what we trigger will need to be keenly defined. It is
|
||||
expected that Secondary Cells only receive `graduated` versions of GitLab. Thus,
|
||||
we'll leverage the use of our Post Deployment Migration pipeline as the
|
||||
gatekeeper for when a package is considered `graduated`. In an ideal world, when
|
||||
the PDM is executed successfully on the Primary Cell, that package is then
|
||||
considered `graduated` and can be deployed to any outer ring. This same concept
|
||||
is already leveraged when we build releases for self managed customers. This
|
||||
break point is already natural to Release Managers and thus is a good carry over
|
||||
for Cell deployments.
|
||||
|
||||
We should aim to deploy to Cells as quickly as possible. For all Cells that exist in a single ring, we should have the ability to deploy in parallel. Doing so minimizes the version drift between Cells and reduces potential issues. If the version drifts too greatly, auto-deploy shall pause itself and an investigation into the reason why we are too far behind begins. Ideally we know about this situation ahead of time. We should aim to be no greater than 1 `graduate` package behind our PDM. Thus the expectation is that for every PDM, is a deployment to our Cells, every day. There are days which the PDM is skipped. We'll need to evaluate on a case-by-case basis why the PDM is halted to determine the detriment this will incur on our Cell deployments.
|
||||
|
||||
|
|
@ -505,30 +517,62 @@ No. Our current labeling schema is primarily to showcase that the commit landed
|
|||
|
||||
**A P1/S1 issue exists, how do we mitigate this on Cells?**
|
||||
|
||||
Cells are still a part of .com, thus our existing [bug](https://handbook.gitlab.com/handbook/engineering/infrastructure/engineering-productivity/issue-triage/#severity-slos) and [vulnerability](https://handbook.gitlab.com/handbook/security/threat-management/vulnerability-management/#remediation-slas) SLA's for remediation apply. We can deploy whatever we want to secondary cells so long as it's considered `graduated`. If a high priority issue comes about, we should be able to freely leverage our existing procedures to update our code base and any given auto-deploy branch for mitigation, and maybe after some extra rounds of testing, or perhaps a slower roll out, we can deploy that auto-deploy package into our cells. This provides us with the same mitigation methods that we leverage today. The problem that this causes is that there could exist some code that may not have been fully vetted. We can still rely on rollbacks in this case and revisit any necessary patch for the next round of auto-deployments and evaluate the fix for another attempt to remediate our cells.
|
||||
Cells are still a part of .com, thus our existing
|
||||
[bug](https://handbook.gitlab.com/handbook/engineering/infrastructure/engineering-productivity/issue-triage/#severity-slos)
|
||||
and [vulnerability](https://handbook.gitlab.com/handbook/security/threat-management/vulnerability-management/#remediation-slas)
|
||||
SLA's for remediation apply. We can deploy whatever we want to secondary cells
|
||||
so long as it's considered `graduated`. If a high priority issue comes about, we
|
||||
should be able to freely leverage our existing procedures to update our code
|
||||
base and any given auto-deploy branch for mitigation, and maybe after some extra
|
||||
rounds of testing, or perhaps a slower roll out, we can deploy that auto-deploy
|
||||
package into our cells. This provides us with the same mitigation methods that
|
||||
we leverage today. The problem that this causes is that there could exist some
|
||||
code that may not have been fully vetted. We can still rely on rollbacks in this
|
||||
case and revisit any necessary patch for the next round of auto-deployments and
|
||||
evaluate the fix for another attempt to remediate our cells.
|
||||
|
||||
**What changes are expected from a Developers perspective**
|
||||
|
||||
Release and Auto-Deploy procedures should largely remain the same. We're shifting where code lands. Any changes in this realm would increase the most the closer we are to Iteration 2.0 when various environments or stages to GitLab begin to change.
|
||||
Release and Auto-Deploy procedures should largely remain the same. We're
|
||||
shifting where code lands. Any changes in this realm would increase the most the
|
||||
closer we are to Iteration 2.0 when various environments or stages to GitLab
|
||||
begin to change.
|
||||
|
||||
**All tiers but one have a failed deploy, what triggers a rollback of that package for all cells?**
|
||||
|
||||
This depends on various characteristics that we'll probably want to iterate on and develop processes for. Example, if we fail on the very first cell on the first Tier, we should investigate that cell, but also ensure that this is not systemic to all cells. This can only be handled on a case-by-case basis. If we reach the last tier and last cell and some failure would occur, there should be no reason to rollback any other cell as enough time should have passed by for us to catch application failures.
|
||||
This depends on various characteristics that we'll probably want to iterate on
|
||||
and develop processes for. Example, if we fail on the very first cell on the
|
||||
first Tier, we should investigate that cell, but also ensure that this is not
|
||||
systemic to all cells. This can only be handled on a case-by-case basis. If we
|
||||
reach the last tier and last cell and some failure would occur, there should be
|
||||
no reason to rollback any other cell as enough time should have passed by for us
|
||||
to catch application failures.
|
||||
|
||||
**What happens with self-managed releases?**
|
||||
|
||||
Theoretically not much changes. Currently we use Production, or .com's Main Stage as our proving grounds for changes that are destined to be releasable for self-managed. This does not change as in the Cellular architecture, this notion for this exists in the same place. The vocabulary changes, in this case, a `graduated` package is now considered safe for a release.
|
||||
Theoretically not much changes. Currently we use Production, or .com's Main
|
||||
Stage as our proving grounds for changes that are destined to be releasable for
|
||||
self-managed. This does not change as in the Cellular architecture, this notion
|
||||
for this exists in the same place. The vocabulary changes, in this case, a
|
||||
`graduated` package is now considered safe for a release.
|
||||
|
||||
**What happens to PreProd**
|
||||
|
||||
This instance specifically tests the hybrid installation of a GitLab package and Helm chart when we create release candidates. It's our last step prior to a release being tagged. This is not impacted by the Cells work. Though we may change how preprod is managed.
|
||||
This instance specifically tests the hybrid installation of a GitLab package and
|
||||
Helm chart when we create release candidates. It's our last step prior to a
|
||||
release being tagged. This is not impacted by the Cells work. Though we may
|
||||
change how preprod is managed.
|
||||
|
||||
**What happens with Staging**
|
||||
|
||||
Staging is crucial for long term instance testing of a deployment alongside QA. Hypothetically staging could completely go away in favor of a deployment to Tier 0. Reference the above Iteration 3 {+TODO add proper link+}
|
||||
Staging is crucial for long term instance testing of a deployment alongside QA.
|
||||
Hypothetically staging could completely go away in favor of a deployment to Tier
|
||||
0. Reference the above Iteration 3 {+TODO add proper link+}
|
||||
|
||||
**What happens to Ops**
|
||||
|
||||
No need to change. But if Cell management becomes easy, it would be prudent to make this installation operate as similar as possible to avoid overloading operations teams with unique knowledge for our many instances.
|
||||
No need to change. But if Cell management becomes easy, it would be prudent to
|
||||
make this installation operate as similar as possible to avoid overloading
|
||||
operations teams with unique knowledge for our many instances.
|
||||
|
||||
This same answer could be provided for the Dev instance.
|
||||
|
|
|
|||
|
|
@ -17,23 +17,61 @@ iterate on the tooling. The content below is a historical version of the
|
|||
blueprint, written prior to incorporating database testing into our development
|
||||
workflow.
|
||||
|
||||
We have identified [common themes of reverted migrations](https://gitlab.com/gitlab-org/gitlab/-/issues/233391) and discovered failed migrations breaking in both production and staging even when successfully tested in a developer environment. We have also experienced production incidents even with successful testing in staging. These failures are quite expensive: they can have a significant effect on availability, block deployments, and generate incident escalations. These escalations must be triaged and either reverted or fixed forward. Often, this can take place without the original author's involvement due to time zones and/or the criticality of the escalation. With our increased deployment speeds and stricter uptime requirements, the need for improving database testing is critical, particularly earlier in the development process (shift left).
|
||||
We have identified [common themes of reverted migrations](https://gitlab.com/gitlab-org/gitlab/-/issues/233391) and discovered
|
||||
failed migrations breaking in both production and staging even when successfully
|
||||
tested in a developer environment. We have also experienced production incidents
|
||||
even with successful testing in staging. These failures are quite expensive:
|
||||
they can have a significant effect on availability, block deployments, and
|
||||
generate incident escalations. These escalations must be triaged and either
|
||||
reverted or fixed forward. Often, this can take place without the original
|
||||
author's involvement due to time zones and/or the criticality of the escalation.
|
||||
With our increased deployment speeds and stricter uptime requirements, the need
|
||||
for improving database testing is critical, particularly earlier in the
|
||||
development process (shift left).
|
||||
|
||||
From a developer's perspective, it is hard, if not unfeasible, to validate a migration on a large enough dataset before it goes into production.
|
||||
From a developer's perspective, it is hard, if not unfeasible, to validate a
|
||||
migration on a large enough dataset before it goes into production.
|
||||
|
||||
Our primary goal is to **provide developers with immediate feedback for new migrations and other database-related changes tested on a full copy of the production database**, and to do so with high levels of efficiency (particularly in terms of infrastructure costs) and security.
|
||||
Our primary goal is to
|
||||
**provide developers with immediate feedback for new migrations and other database-related changes tested on a full copy of the production database**,
|
||||
and to do so with high levels of efficiency (particularly in terms of infrastructure costs) and security.
|
||||
|
||||
## Current day
|
||||
|
||||
Developers are expected to test database migrations prior to deploying to any environment, but we lack the ability to perform testing against large environments such as GitLab.com. The [developer database migration style guide](../../../development/migration_style_guide.md) provides guidelines on migrations, and we focus on validating migrations during code review and testing in CI and staging.
|
||||
Developers are expected to test database migrations prior to deploying to any
|
||||
environment, but we lack the ability to perform testing against large
|
||||
environments such as GitLab.com. The [developer database migration style guide](../../../development/migration_style_guide.md)
|
||||
provides guidelines on migrations, and we focus on validating migrations during code review and testing
|
||||
in CI and staging.
|
||||
|
||||
The [code review phase](../../../development/database_review.md) involves Database Reviewers and Maintainers to manually check the migrations committed. This often involves knowing and spotting problematic patterns and their particular behavior on GitLab.com from experience. There is no large-scale environment available that allows us to test database migrations before they are being merged.
|
||||
The [code review phase](../../../development/database_review.md) involves
|
||||
Database Reviewers and Maintainers to manually check the migrations committed.
|
||||
This often involves knowing and spotting problematic patterns and their
|
||||
particular behavior on GitLab.com from experience. There is no large-scale
|
||||
environment available that allows us to test database migrations before they are
|
||||
being merged.
|
||||
|
||||
Testing in CI is done on a very small database. We mainly check forward/backward migration consistency, evaluate RuboCop rules to detect well-known problematic behaviors (static code checking) and have a few other, rather technical checks in place (adding the right files etc). That is, we typically find code or other rather simple errors, but cannot surface any data related errors - which are also typically not covered by unit tests either.
|
||||
Testing in CI is done on a very small database. We mainly check forward/backward
|
||||
migration consistency, evaluate RuboCop rules to detect well-known problematic
|
||||
behaviors (static code checking) and have a few other, rather technical checks
|
||||
in place (adding the right files etc). That is, we typically find code or other
|
||||
rather simple errors, but cannot surface any data related errors - which are
|
||||
also typically not covered by unit tests either.
|
||||
|
||||
Once merged, migrations are being deployed to the staging environment. Its database size is less than 5% of the production database size as of January 2021 and its recent data distribution does not resemble the production site. Oftentimes, we see migrations succeed in staging but then fail in production due to query timeouts or other unexpected problems. Even if we caught problems in staging, this is still expensive to reconcile and ideally we want to catch those problems as early as possible in the development cycle.
|
||||
Once merged, migrations are being deployed to the staging environment. Its
|
||||
database size is less than 5% of the production database size as of January 2021
|
||||
and its recent data distribution does not resemble the production site.
|
||||
Oftentimes, we see migrations succeed in staging but then fail in production due
|
||||
to query timeouts or other unexpected problems. Even if we caught problems in
|
||||
staging, this is still expensive to reconcile and ideally we want to catch those
|
||||
problems as early as possible in the development cycle.
|
||||
|
||||
Today, we have gained experience with working on a thin-cloned production database (more on this below) and already use it to provide developers with access to production query plans, automated query feedback and suggestions with optimizations. This is built around [Database Lab](https://gitlab.com/postgres-ai/database-lab) and [Joe](https://gitlab.com/postgres-ai/joe), both available through Slack (using ChatOps) and [postgres.ai](https://postgres.ai/).
|
||||
Today, we have gained experience with working on a thin-cloned production
|
||||
database (more on this below) and already use it to provide developers with
|
||||
access to production query plans, automated query feedback and suggestions with
|
||||
optimizations. This is built around [Database Lab](https://gitlab.com/postgres-ai/database-lab)
|
||||
and [Joe](https://gitlab.com/postgres-ai/joe), both available through Slack
|
||||
(using ChatOps) and [postgres.ai](https://postgres.ai/).
|
||||
|
||||
## Vision
|
||||
|
||||
|
|
@ -60,25 +98,46 @@ For database queries, we can automatically gather:
|
|||
After having gotten that feedback:
|
||||
|
||||
1. I can go back and investigate a performance problem with the data migration.
|
||||
1. Once I have a fix pushed, I can repeat the above cycle and eventually send my merge request for database review. During the database review, the database reviewer and maintainer have all the additional generated information available to them to make an informed decision on the performance of the introduced changes.
|
||||
1. Once I have a fix pushed, I can repeat the above cycle and eventually send my
|
||||
merge request for database review. During the database review, the database
|
||||
reviewer and maintainer have all the additional generated information
|
||||
available to them to make an informed decision on the performance of the
|
||||
introduced changes.
|
||||
|
||||
This information gathering is done in a protected and safe environment, making sure that there is no unauthorized access to production data and we can safely execute code in this environment.
|
||||
This information gathering is done in a protected and safe environment, making
|
||||
sure that there is no unauthorized access to production data and we can safely
|
||||
execute code in this environment.
|
||||
|
||||
The intended benefits include:
|
||||
|
||||
- Shifting left: Allow developers to understand large-scale database performance and what to expect to happen on GitLab.com in a self-service manner
|
||||
- Identify errors that are only generated when working against a production scale dataset with real data (with inconsistencies or unexpected patterns)
|
||||
- Automate the information gathering phase to make it easier for everybody involved in code review (developer, reviewer, maintainer) by providing relevant details automatically and upfront.
|
||||
- Shifting left: Allow developers to understand large-scale database performance
|
||||
and what to expect to happen on GitLab.com in a self-service manner
|
||||
- Identify errors that are only generated when working against a production
|
||||
scale dataset with real data (with inconsistencies or unexpected patterns)
|
||||
- Automate the information gathering phase to make it easier for everybody
|
||||
involved in code review (developer, reviewer, maintainer) by providing
|
||||
relevant details automatically and upfront.
|
||||
|
||||
## Technology and next steps
|
||||
|
||||
We already use Database Lab from [postgres.ai](https://postgres.ai/), which is a thin-cloning technology. We maintain a PostgreSQL replica which is up to date with production data but does not serve any production traffic. This runs Database Lab which allows us to quickly create a full clone of the production dataset (in the order of seconds).
|
||||
We already use Database Lab from [postgres.ai](https://postgres.ai/), which is a
|
||||
thin-cloning technology. We maintain a PostgreSQL replica which is up to date
|
||||
with production data but does not serve any production traffic. This runs
|
||||
Database Lab which allows us to quickly create a full clone of the production
|
||||
dataset (in the order of seconds).
|
||||
|
||||
Internally, this is based on ZFS and implements a "thin-cloning technology". That is, ZFS snapshots are being used to clone the data and it exposes a full read/write PostgreSQL cluster based on the cloned data. This is called a *thin clone*. It is rather short lived and is going to be destroyed again shortly after we are finished using it.
|
||||
Internally, this is based on ZFS and implements a "thin-cloning technology".
|
||||
That is, ZFS snapshots are being used to clone the data and it exposes a full
|
||||
read/write PostgreSQL cluster based on the cloned data. This is called a *thin clone*.
|
||||
It is rather short lived and is going to be destroyed again shortly
|
||||
after we are finished using it.
|
||||
|
||||
A thin clone is fully read/write. This allows us to execute migrations on top of it.
|
||||
|
||||
Database Lab provides an API we can interact with to manage thin clones. In order to automate the migration and query testing, we add steps to the `gitlab/gitlab-org` CI pipeline. This triggers automation that performs the following steps for a given merge request:
|
||||
Database Lab provides an API we can interact with to manage thin clones. In
|
||||
order to automate the migration and query testing, we add steps to the
|
||||
`gitlab/gitlab-org` CI pipeline. This triggers automation that performs the
|
||||
following steps for a given merge request:
|
||||
|
||||
1. Create a thin-clone with production data for this testing session.
|
||||
1. Pull GitLab code from the merge request.
|
||||
|
|
@ -89,45 +148,106 @@ Database Lab provides an API we can interact with to manage thin clones. In orde
|
|||
|
||||
### Short-term
|
||||
|
||||
The short-term focus is on testing regular migrations (typically schema changes) and using the existing Database Lab instance from postgres.ai for it.
|
||||
The short-term focus is on testing regular migrations (typically schema changes)
|
||||
and using the existing Database Lab instance from postgres.ai for it.
|
||||
|
||||
In order to secure this process and meet compliance goals, the runner environment is treated as a *production* environment and similarly locked down, monitored and audited. Only Database Maintainers have access to the CI pipeline and its job output. Everyone else can only see the results and statistics posted back on the merge request.
|
||||
In order to secure this process and meet compliance goals, the runner
|
||||
environment is treated as a *production* environment and similarly locked down,
|
||||
monitored and audited. Only Database Maintainers have access to the CI pipeline
|
||||
and its job output. Everyone else can only see the results and statistics posted
|
||||
back on the merge request.
|
||||
|
||||
We implement a secured CI pipeline on [Internal GitLab for Operations](https://ops.gitlab.net/users/sign_in) that adds the execution steps outlined above. The goal is to secure this pipeline to solve the following problem:
|
||||
We implement a secured CI pipeline on [Internal GitLab for Operations](https://ops.gitlab.net/users/sign_in)
|
||||
that adds the execution steps outlined above. The goal is to secure this pipeline
|
||||
to solve the following problem:
|
||||
|
||||
Make sure we strongly protect production data, even though we allow everyone (GitLab team/developers) to execute arbitrary code on the thin-clone which contains production data.
|
||||
Make sure we strongly protect production data, even though we allow everyone
|
||||
(GitLab team/developers) to execute arbitrary code on the thin-clone which contains production data.
|
||||
|
||||
This is in principle achieved by locking down the GitLab Runner instance executing the code and its containers on a network level, such that no data can escape over the network. We make sure no communication can happen to the outside world from within the container executing the GitLab Rails code (and its database migrations).
|
||||
This is in principle achieved by locking down the GitLab Runner instance
|
||||
executing the code and its containers on a network level, such that no data can
|
||||
escape over the network. We make sure no communication can happen to the outside
|
||||
world from within the container executing the GitLab Rails code (and its
|
||||
database migrations).
|
||||
|
||||
Furthermore, we limit the ability to view the results of the jobs (including the output printed from code) to Maintainer and Owner level on the [Internal GitLab for Operations](https://ops.gitlab.net/users/sign_in) pipeline and provide only a high level summary back to the original MR. If there are issues or errors in one of the jobs run, the database Maintainer assigned to review the MR can check the original job for more details.
|
||||
Furthermore, we limit the ability to view the results of the jobs (including the
|
||||
output printed from code) to Maintainer and Owner level on the
|
||||
[Internal GitLab for Operations](https://ops.gitlab.net/users/sign_in) pipeline and provide only
|
||||
a high level summary back to the original MR. If there are issues or errors in
|
||||
one of the jobs run, the database Maintainer assigned to review the MR can check
|
||||
the original job for more details.
|
||||
|
||||
With this step implemented, we already have the ability to execute database migrations on the thin-cloned GitLab.com database automatically from GitLab CI and provide feedback back to the merge request and the developer. The content of that feedback is expected to evolve over time and we can continuously add to this.
|
||||
With this step implemented, we already have the ability to execute database
|
||||
migrations on the thin-cloned GitLab.com database automatically from GitLab CI
|
||||
and provide feedback back to the merge request and the developer. The content of
|
||||
that feedback is expected to evolve over time and we can continuously add to
|
||||
this.
|
||||
|
||||
We already have an [MVC-style implementation for the pipeline](https://gitlab.com/gitlab-org/database-team/gitlab-com-migrations) for reference and an [example merge request with feedback](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/50793#note_477815261) from the pipeline.
|
||||
We already have an
|
||||
[MVC-style implementation for the pipeline](https://gitlab.com/gitlab-org/database-team/gitlab-com-migrations)
|
||||
for reference and an [example merge request with feedback](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/50793#note_477815261)
|
||||
from the pipeline.
|
||||
|
||||
The short-term goal is detailed in [this epic](https://gitlab.com/groups/gitlab-org/database-team/-/epics/6).
|
||||
|
||||
### Mid-term - Improved feedback, query testing and background migration testing
|
||||
|
||||
Mid-term, we plan to expand the level of detail the testing pipeline reports back to the merge request and expand its scope to cover query testing, too. By doing so, we use our experience from database code reviews and using thin-clone technology and bring this back closer to the GitLab workflow. Instead of reaching out to different tools (`postgres.ai`, `joe`, Slack, plan visualizations, and so on) we bring this back to GitLab and working directly on the merge request.
|
||||
Mid-term, we plan to expand the level of detail the testing pipeline reports
|
||||
back to the merge request and expand its scope to cover query testing, too. By
|
||||
doing so, we use our experience from database code reviews and using thin-clone
|
||||
technology and bring this back closer to the GitLab workflow. Instead of
|
||||
reaching out to different tools (`postgres.ai`, `joe`, Slack, plan
|
||||
visualizations, and so on) we bring this back to GitLab and working directly on
|
||||
the merge request.
|
||||
|
||||
Secondly, we plan to cover background migrations testing, too. These are typically data migrations that are scheduled to run over a long period of time. The success of both the scheduling phase and the job execution phase typically depends a lot on data distribution - which only surfaces when running these migrations on actual production data. In order to become confident about a background migration, we plan to provide the following feedback:
|
||||
Secondly, we plan to cover background migrations testing, too. These are
|
||||
typically data migrations that are scheduled to run over a long period of time.
|
||||
The success of both the scheduling phase and the job execution phase typically
|
||||
depends a lot on data distribution - which only surfaces when running these
|
||||
migrations on actual production data. In order to become confident about a
|
||||
background migration, we plan to provide the following feedback:
|
||||
|
||||
1. Scheduling phase - query statistics (for example a histogram of query execution times), job statistics (how many jobs, overall duration, and so on), batch sizes.
|
||||
1. Scheduling phase - query statistics (for example a histogram of query
|
||||
execution times), job statistics (how many jobs, overall duration, and so on),
|
||||
batch sizes.
|
||||
1. Execution phase - using a few instances of a job as examples, we execute those to gather query and runtime statistics.
|
||||
|
||||
### Long-term - incorporate into GitLab product
|
||||
|
||||
There are opportunities to discuss for extracting features from this into GitLab itself. For example, annotating the merge request with query examples and attaching feedback gathered from the testing run can become a first-class citizen instead of using merge request description and comments for it. We plan to evaluate those ideas as we see those being used in earlier phases and bring our experience back into the product.
|
||||
There are opportunities to discuss for extracting features from this into GitLab
|
||||
itself. For example, annotating the merge request with query examples and
|
||||
attaching feedback gathered from the testing run can become a first-class
|
||||
citizen instead of using merge request description and comments for it. We plan
|
||||
to evaluate those ideas as we see those being used in earlier phases and bring
|
||||
our experience back into the product.
|
||||
|
||||
## An alternative discussed: Anonymization
|
||||
|
||||
At the core of this problem lies the concern about executing (potentially arbitrary) code on a production dataset and making sure the production data is well protected. The approach discussed above solves this by strongly limiting access to the output of said code.
|
||||
At the core of this problem lies the concern about executing (potentially arbitrary)
|
||||
code on a production dataset and making sure the production data is
|
||||
well protected. The approach discussed above solves this by strongly limiting
|
||||
access to the output of said code.
|
||||
|
||||
An alternative approach we have discussed and abandoned is to "scrub" and anonymize production data. The idea is to remove any sensitive data from the database and use the resulting dataset for database testing. This has a lot of downsides which led us to abandon the idea:
|
||||
An alternative approach we have discussed and abandoned is to "scrub" and
|
||||
anonymize production data. The idea is to remove any sensitive data from the
|
||||
database and use the resulting dataset for database testing. This has a lot of
|
||||
downsides which led us to abandon the idea:
|
||||
|
||||
- Anonymization is complex by nature - it is a hard problem to call a "scrubbed clone" actually safe to work with in public. Different data types may require different anonymization techniques (for example, anonymizing sensitive information inside a JSON field) and only focusing on one attribute at a time does not guarantee that a dataset is fully anonymized (for example join attacks or using timestamps in conjunction to public profiles/projects to de-anonymize users by there activity).
|
||||
- Anonymization requires an additional process to keep track and update the set of attributes considered as sensitive, ongoing maintenance and security reviews every time the database schema changes.
|
||||
- Annotating data as "sensitive" is error prone, with the wrong anonymization approach used for a data type or one sensitive attribute accidentally not marked as such possibly leading to a data breach.
|
||||
- Scrubbing not only removes sensitive data, but it also changes data distribution, which greatly affects performance of migrations and queries.
|
||||
- Scrubbing heavily changes the database contents, potentially updating a lot of data, which leads to different data storage details (think MVC bloat), affecting performance of migrations and queries.
|
||||
- Anonymization is complex by nature - it is a hard problem to call a "scrubbed clone"
|
||||
actually safe to work with in public. Different data types may require
|
||||
different anonymization techniques (for example, anonymizing sensitive
|
||||
information inside a JSON field) and only focusing on one attribute at a time
|
||||
does not guarantee that a dataset is fully anonymized (for example join
|
||||
attacks or using timestamps in conjunction to public profiles/projects to
|
||||
de-anonymize users by there activity).
|
||||
- Anonymization requires an additional process to keep track and update the set
|
||||
of attributes considered as sensitive, ongoing maintenance and security
|
||||
reviews every time the database schema changes.
|
||||
- Annotating data as "sensitive" is error prone, with the wrong anonymization
|
||||
approach used for a data type or one sensitive attribute accidentally not
|
||||
marked as such possibly leading to a data breach.
|
||||
- Scrubbing not only removes sensitive data, but it also changes data
|
||||
distribution, which greatly affects performance of migrations and queries.
|
||||
- Scrubbing heavily changes the database contents, potentially updating a lot of
|
||||
data, which leads to different data storage details (think MVC bloat),
|
||||
affecting performance of migrations and queries.
|
||||
|
|
|
|||
|
|
@ -12,14 +12,29 @@ approvers: [ ]
|
|||
|
||||
The following represents our current DR challenges and are candidates for problems that we should address in this architecture blueprint.
|
||||
|
||||
1. Postgres replicas run close to capacity and are scaled manually. New instances must go through Terraform CI pipelines and Chef configuration. Over-provisioning to absorb a zone failure would add significant cloud-spend (see proposal section at the end of the document for details).
|
||||
1. HAProxy (load balancing) is scaled manually and must go through Terraform CI pipelines and Chef configuration.
|
||||
1. CI runner managers are present in 2 availability zones and scaled close to capacity. New instances must go through Terraform CI pipelines and Chef configuration.
|
||||
1. In a zone there are saturation limits, like the number of replicas that need to be manually adjusted if load is shifted away from a failed availability zone.
|
||||
1. Gitaly `RPO` is limited by the frequency of disk snapshots, `RTO` is limited by the time it takes to provision and configure through Terraform CI pipelines and Chef configuration.
|
||||
1. Monitoring infrastructure that collects metrics from Chef managed VMs is redundant across 2 availability zones and scaled manually. New instances must go through Terraform CI pipelines and Chef configuration.
|
||||
1. The Chef server which is responsible for all configuration of Chef managed VMs is a single point of failure located in `us-central1`. It has a local Postgres database and files on local disk.
|
||||
1. The infrastructure (`dev.gitlab.org`) that builds Docker images and packages is located in a single region, and is a single point of failure.
|
||||
1. Postgres replicas run close to capacity and are scaled manually. New
|
||||
instances must go through Terraform CI pipelines and Chef configuration.
|
||||
Over-provisioning to absorb a zone failure would add significant cloud-spend
|
||||
(see proposal section at the end of the document for details).
|
||||
1. HAProxy (load balancing) is scaled manually and must go through Terraform CI
|
||||
pipelines and Chef configuration.
|
||||
1. CI runner managers are present in 2 availability zones and scaled close to
|
||||
capacity. New instances must go through Terraform CI pipelines and Chef
|
||||
configuration.
|
||||
1. In a zone there are saturation limits, like the number of replicas that need
|
||||
to be manually adjusted if load is shifted away from a failed availability
|
||||
zone.
|
||||
1. Gitaly `RPO` is limited by the frequency of disk snapshots, `RTO` is limited
|
||||
by the time it takes to provision and configure through Terraform CI
|
||||
pipelines and Chef configuration.
|
||||
1. Monitoring infrastructure that collects metrics from Chef managed VMs is
|
||||
redundant across 2 availability zones and scaled manually. New instances must
|
||||
go through Terraform CI pipelines and Chef configuration.
|
||||
1. The Chef server which is responsible for all configuration of Chef managed
|
||||
VMs is a single point of failure located in `us-central1`. It has a local
|
||||
Postgres database and files on local disk.
|
||||
1. The infrastructure (`dev.gitlab.org`) that builds Docker images and packages
|
||||
is located in a single region, and is a single point of failure.
|
||||
|
||||
## Zonal recovery work-streams
|
||||
|
||||
|
|
@ -53,10 +68,14 @@ If we allow a zone to scale up rapidly, these limits need to be adjusted or re-e
|
|||
HAProxy is a fleet of Chef managed VMs that are statically allocated across 3 AZs in `us-east1`.
|
||||
In the case of a zonal outage we would need to rapidly scale this fleet, adding to our RTO.
|
||||
|
||||
In FY24Q4 the Foundations team started working on a proof-of-concept to use [Istio in non-prod environments](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1157).
|
||||
We anticipate in FY25 to have a replacement for HAProxy using Istio and [GKE Gateway](https://cloud.google.com/kubernetes-engine/docs/concepts/gateway-api).
|
||||
Completing this work reduces the impact to our LoadBalancing layer for zonal outages, as it eliminates the need to manually scale the HAProxy fleet.
|
||||
Additionally, we spend around 17k/month on HAProxy nodes, so there may be a cloud-spend reduction if we are able to reduce this footprint.
|
||||
In FY24Q4 the Foundations team started working on a proof-of-concept to use
|
||||
[Istio in non-prod environments](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1157).
|
||||
We anticipate in FY25 to have a replacement for HAProxy using Istio and
|
||||
[GKE Gateway](https://cloud.google.com/kubernetes-engine/docs/concepts/gateway-api).
|
||||
Completing this work reduces the impact to our LoadBalancing layer for zonal outages,
|
||||
as it eliminates the need to manually scale the HAProxy fleet.
|
||||
Additionally, we spend around 17k/month on HAProxy nodes, so there may be a
|
||||
cloud-spend reduction if we are able to reduce this footprint.
|
||||
|
||||
### Create an HA Chef server configuration to avoid an outage for a single zone failure
|
||||
|
||||
|
|
|
|||
|
|
@ -15,55 +15,131 @@ participating-stages: [ "~group::incubation" ]
|
|||
|
||||
## Summary
|
||||
|
||||
GitLab users can submit new issues and comments via email. Administrators configure special mailboxes that GitLab polls on a regular basis and fetches new unread emails. Based on the slug and a hash in the sub-addressing part of the email address, we determine whether this email will file an issue, add a Service Desk issue, or a comment to an existing issue.
|
||||
GitLab users can submit new issues and comments via email. Administrators
|
||||
configure special mailboxes that GitLab polls on a regular basis and fetches new
|
||||
unread emails. Based on the slug and a hash in the sub-addressing part of the
|
||||
email address, we determine whether this email will file an issue, add a Service
|
||||
Desk issue, or a comment to an existing issue.
|
||||
|
||||
Right now emails are ingested by a separate process called `mail_room`. We would like to stop ingesting emails via `mail_room` and instead use scheduled Sidekiq jobs to do this directly inside GitLab.
|
||||
Right now emails are ingested by a separate process called `mail_room`. We would
|
||||
like to stop ingesting emails via `mail_room` and instead use scheduled Sidekiq
|
||||
jobs to do this directly inside GitLab.
|
||||
|
||||
This lays out the foundation for [custom email address ingestion for Service Desk](https://gitlab.com/gitlab-org/gitlab/-/issues/329990), detailed health logging and makes it easier to integrate other service provider adapters (for example Gmail via API). We will also reduce the infrastructure setup and maintenance costs for customers on self-managed and make it easier for team members to work with email ingestion in GDK.
|
||||
This lays out the foundation for
|
||||
[custom email address ingestion for Service Desk](https://gitlab.com/gitlab-org/gitlab/-/issues/329990),
|
||||
detailed health logging and makes it easier to integrate other service provider
|
||||
adapters (for example Gmail via API). We will also reduce the infrastructure setup
|
||||
and maintenance costs for customers on self-managed and make it easier for team members to work with email ingestion in GDK.
|
||||
|
||||
## Glossary
|
||||
|
||||
- Email ingestion: Reading emails from a mailbox via IMAP or an API and forwarding it for processing (for example create an issue or add a comment)
|
||||
- Sub-addressing: An email address consist of a local part (everything before `@`) and a domain part. With email sub-addressing you can create unique variations of an email address by adding a `+` symbol followed by any text to the local part. You can use these sub-addresses to filter, categorize or distinguish between them as all these emails will be delivered to the same mailbox. For example `user+subaddress@example.com` and `user+1@example.com` and sub-addresses for `user@example.com`.
|
||||
- `mail_room`: [An executable script](https://gitlab.com/gitlab-org/ruby/gems/gitlab-mail_room) that spawns a new process for each configured mailbox, reads new emails on a regular basis and forwards the emails to a processing unit.
|
||||
- [`incoming_email`](../../../administration/incoming_email.md): An email address that is used for adding comments and issues via email. When you reply on a GitLab notification of an issue comment, this response email will go to the configured `incoming_email` mailbox, read via `mail_room` and processed by GitLab. You can also use this address as a Service Desk email address. The configuration is per instance and needs full IMAP or Microsoft Graph API credentials to access the mailbox.
|
||||
- [`service_desk_email`](../../../user/project/service_desk/configure.md#use-an-additional-service-desk-alias-email): Additional alias email address that is only used for Service Desk. You can also use an address generated from `incoming_email` to create Service Desk issues.
|
||||
- `delivery_method`: Administrators can define how `mail_room` forwards fetched emails to GitLab. The legacy and now deprecated approach is called `sidekiq`, which directly adds a new job to the Redis queue. The current and recommended way is called `webhook`, which sends a POST request to an internal GitLab API endpoint. This endpoint then adds a new job using the full framework for compressing job data etc. The downside is, that `mail_room` and GitLab need a shared key file, which might be challenging to distribute in large setups.
|
||||
- Email ingestion: Reading emails from a mailbox via IMAP or an API and
|
||||
forwarding it for processing (for example create an issue or add a comment)
|
||||
- Sub-addressing: An email address consist of a local part (everything before
|
||||
`@`) and a domain part. With email sub-addressing you can create unique
|
||||
variations of an email address by adding a `+` symbol followed by any text to
|
||||
the local part. You can use these sub-addresses to filter, categorize or
|
||||
distinguish between them as all these emails will be delivered to the same
|
||||
mailbox. For example `user+subaddress@example.com` and `user+1@example.com`
|
||||
and sub-addresses for `user@example.com`.
|
||||
- `mail_room`: [An executable script](https://gitlab.com/gitlab-org/ruby/gems/gitlab-mail_room) that spawns
|
||||
a new process for each configured mailbox, reads new emails on a regular basis
|
||||
and forwards the emails to a processing unit.
|
||||
- [`incoming_email`](../../../administration/incoming_email.md): An email
|
||||
address that is used for adding comments and issues via email. When you reply
|
||||
on a GitLab notification of an issue comment, this response email will go to
|
||||
the configured `incoming_email` mailbox, read via `mail_room` and processed by
|
||||
GitLab. You can also use this address as a Service Desk email address. The
|
||||
configuration is per instance and needs full IMAP or Microsoft Graph API
|
||||
credentials to access the mailbox.
|
||||
- [`service_desk_email`](../../../user/project/service_desk/configure.md#use-an-additional-service-desk-alias-email):
|
||||
Additional alias email address that is only used for Service Desk. You can
|
||||
also use an address generated from `incoming_email` to create Service Desk
|
||||
issues.
|
||||
- `delivery_method`: Administrators can define how `mail_room` forwards fetched
|
||||
emails to GitLab. The legacy and now deprecated approach is called `sidekiq`,
|
||||
which directly adds a new job to the Redis queue. The current and recommended
|
||||
way is called `webhook`, which sends a POST request to an internal GitLab API
|
||||
endpoint. This endpoint then adds a new job using the full framework for
|
||||
compressing job data etc. The downside is, that `mail_room` and GitLab need a
|
||||
shared key file, which might be challenging to distribute in large setups.
|
||||
|
||||
## Motivation
|
||||
|
||||
The current implementation lacks scalability and requires significant infrastructure maintenance. Additionally, there is a lack of [proper observability for configuration errors](https://gitlab.com/gitlab-org/gitlab/-/issues/384530) and [overall system health](https://gitlab.com/groups/gitlab-org/-/epics/9407). Furthermore, [setting up and providing support for multi-node Linux package (Omnibus) installations](https://gitlab.com/gitlab-org/gitlab/-/issues/391859) is challenging, and periodic email ingestion issues necessitate reactive support.
|
||||
The current implementation lacks scalability and requires significant infrastructure
|
||||
maintenance. Additionally, there is a lack of
|
||||
[proper observability for configuration errors](https://gitlab.com/gitlab-org/gitlab/-/issues/384530) and
|
||||
[overall system health](https://gitlab.com/groups/gitlab-org/-/epics/9407). Furthermore,
|
||||
[setting up and providing support for multi-node Linux package (Omnibus) installations](https://gitlab.com/gitlab-org/gitlab/-/issues/391859)
|
||||
is challenging, and periodic email ingestion issues necessitate reactive support.
|
||||
|
||||
Because we are using a fork of the `mail_room` gem ([`gitlab-mail_room`](https://gitlab.com/gitlab-org/ruby/gems/gitlab-mail_room)), which contains some GitLab specific features that won't be ported upstream, we have a noteable maintenance overhead.
|
||||
Because we are using a fork of the `mail_room` gem
|
||||
([`gitlab-mail_room`](https://gitlab.com/gitlab-org/ruby/gems/gitlab-mail_room)),
|
||||
which contains some GitLab specific features that won't be ported upstream, we have a noteable maintenance overhead.
|
||||
|
||||
The [Service Desk Single-Engineer-Group (SEG)](https://handbook.gitlab.com/handbook/engineering/development/incubation/service-desk/) started work on [customizable email addresses for Service Desk](https://gitlab.com/gitlab-org/gitlab/-/issues/329990) and [released the first iteration in beta in `16.4`](https://about.gitlab.com/releases/2023/09/22/gitlab-16-4-released/#custom-email-address-for-service-desk). As a [MVC we introduced a `Forwarding & SMTP` mode](https://gitlab.com/gitlab-org/gitlab/-/issues/329990#note_1201344150) where administrators set up email forwarding from their custom email address to the projects' `incoming_mail` email address. They also provide SMTP credentials so GitLab can send emails from the custom email address on their behalf. We don't need any additional email ingestion other than the existing mechanics for this approach to work.
|
||||
The [Service Desk Single-Engineer-Group (SEG)](https://handbook.gitlab.com/handbook/engineering/development/incubation/service-desk/)
|
||||
started work on [customizable email addresses for Service Desk](https://gitlab.com/gitlab-org/gitlab/-/issues/329990) and
|
||||
[released the first iteration in beta in `16.4`](https://about.gitlab.com/releases/2023/09/22/gitlab-16-4-released/#custom-email-address-for-service-desk). As a
|
||||
[MVC we introduced a `Forwarding & SMTP` mode](https://gitlab.com/gitlab-org/gitlab/-/issues/329990#note_1201344150)
|
||||
where administrators set up email forwarding from their custom email address to
|
||||
the projects' `incoming_mail` email address. They also provide SMTP credentials
|
||||
so GitLab can send emails from the custom email address on their behalf. We don't
|
||||
need any additional email ingestion other than the existing mechanics for this approach to work.
|
||||
|
||||
As a second iteration we'd like to add Microsoft Graph support for custom email addresses for Service Desk as well. Therefore we need a way to ingest more than the system defined two addresses. We will explore a solution path for Microsoft Graph support where privileged users can connect a custom email account and we can [receive messages via a Microsoft Graph webhook (`Outlook message`)](https://learn.microsoft.com/en-us/graph/change-notifications-overview#supported-resources). GitLab would need a public endpoint to receive updates on emails. That might not work for Self-managed instances, so we'll need direct email ingestion for Microsoft customers as well. But using the webhook approach could improve performance and efficiency for GitLab SaaS where we potentially have thousands of mailboxes to poll.
|
||||
As a second iteration we'd like to add Microsoft Graph support for custom email addresses
|
||||
for Service Desk as well. Therefore we need a way to ingest more than the system defined
|
||||
two addresses. We will explore a solution path for Microsoft Graph support where
|
||||
privileged users can connect a custom email account and we can
|
||||
[receive messages via a Microsoft Graph webhook (`Outlook message`)](https://learn.microsoft.com/en-us/graph/change-notifications-overview#supported-resources).
|
||||
GitLab would need a public endpoint to receive updates on emails. That might not
|
||||
work for Self-managed instances, so we'll need direct email ingestion for
|
||||
Microsoft customers as well. But using the webhook approach could improve performance
|
||||
and efficiency for GitLab SaaS where we potentially have thousands of mailboxes to poll.
|
||||
|
||||
### Goals
|
||||
|
||||
Our goals for this initiative are to enhance the scalability of email ingestion and slim down the infrastructure significantly.
|
||||
|
||||
1. This consolidation will eliminate the need for setup for the separate process and pave the way for future initiatives, including direct custom email address ingestion (IMAP & Microsoft Graph), [improved health monitoring](https://gitlab.com/groups/gitlab-org/-/epics/9407), [data retention (preserving originals)](https://gitlab.com/groups/gitlab-org/-/epics/10521), and [enhanced processing of attachments within email size limits](https://gitlab.com/gitlab-org/gitlab/-/issues/406668).
|
||||
1. Make it easier for team members to develop features with email ingestion. [Right now it needs several manual steps.](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/service_desk_mail_room.md)
|
||||
1. This consolidation will eliminate the need for setup for the separate process
|
||||
and pave the way for future initiatives, including direct custom email address ingestion
|
||||
(IMAP & Microsoft Graph), [improved health monitoring](https://gitlab.com/groups/gitlab-org/-/epics/9407),
|
||||
[data retention (preserving originals)](https://gitlab.com/groups/gitlab-org/-/epics/10521), and
|
||||
[enhanced processing of attachments within email size limits](https://gitlab.com/gitlab-org/gitlab/-/issues/406668).
|
||||
1. Make it easier for team members to develop features with email ingestion.
|
||||
[Right now it needs several manual steps.](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/service_desk_mail_room.md)
|
||||
|
||||
### Non-Goals
|
||||
|
||||
This blueprint does not aim to lay out implementation details for all the listed future initiatives. But it will be the foundation for upcoming features (customizable Service Desk email address IMAP/Microsoft Graph, health checks etc.).
|
||||
This blueprint does not aim to lay out implementation details for all the listed
|
||||
future initiatives. But it will be the foundation for upcoming features
|
||||
(customizable Service Desk email address IMAP/Microsoft Graph, health checks etc.).
|
||||
|
||||
We don't include other ingestion methods. We focus on delivering the current set: IMAP and Microsoft Graph API for `incoming_email` and `service_desk_email`.
|
||||
We don't include other ingestion methods. We focus on delivering the current set:
|
||||
IMAP and Microsoft Graph API for `incoming_email` and `service_desk_email`.
|
||||
|
||||
## Current setup
|
||||
|
||||
Administrators configure settings (credentials and delivery method) for email mailboxes (for [`incoming_email`](../../../administration/incoming_email.md) and [`service_desk_email`](../../../user/project/service_desk/configure.md#use-an-additional-service-desk-alias-email)) in `gitlab.rb` configuration file. After each change GitLab needs to be reconfigured and restarted to apply the new settings.
|
||||
Administrators configure settings (credentials and delivery method) for email
|
||||
mailboxes (for [`incoming_email`](../../../administration/incoming_email.md) and
|
||||
[`service_desk_email`](../../../user/project/service_desk/configure.md#use-an-additional-service-desk-alias-email))
|
||||
in `gitlab.rb` configuration file. After each change GitLab needs to be
|
||||
reconfigured and restarted to apply the new settings.
|
||||
|
||||
We use the separate process `mail_room` to ingest emails from those mailboxes. `mail_room` spawns a thread for each configured mailbox and polls those mailboxes every minute. In the meantime the threads are idle. `mail_room` reads a configuration file that is generated from the settings in `gitlab.rb`.
|
||||
We use the separate process `mail_room` to ingest emails from those mailboxes.
|
||||
`mail_room` spawns a thread for each configured mailbox and polls those
|
||||
mailboxes every minute. In the meantime the threads are idle. `mail_room` reads
|
||||
a configuration file that is generated from the settings in `gitlab.rb`.
|
||||
|
||||
`mail_room` can connect via IMAP and Microsoft Graph, fetch unread emails, and mark them as read or deleted (based on settings). It takes an email and distributes it to its destination via one of the two delivery methods.
|
||||
`mail_room` can connect via IMAP and Microsoft Graph, fetch unread emails, and
|
||||
mark them as read or deleted (based on settings). It takes an email and
|
||||
distributes it to its destination via one of the two delivery methods.
|
||||
|
||||
### `webhook` delivery method (recommended)
|
||||
|
||||
The `webhook` delivery method is the recommended way to move ingested emails from `mail_room` to GitLab. `mail_room` posts the email body and metadata to an internal API endpoint `/api/v4/internal/mail_room`, that selects the correct handler worker and schedules it for execution.
|
||||
The `webhook` delivery method is the recommended way to move ingested emails
|
||||
from `mail_room` to GitLab. `mail_room` posts the email body and metadata to an
|
||||
internal API endpoint `/api/v4/internal/mail_room`, that selects the correct
|
||||
handler worker and schedules it for execution.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
|
|
@ -86,7 +162,12 @@ flowchart TB
|
|||
|
||||
### `sidekiq` delivery method (deprecated since 16.0)
|
||||
|
||||
The `sidekiq` delivery method adds the email body and metadata directly to the Redis queue that Sidekiq uses to manage jobs. It has been [deprecated in 16.0](../../../update/deprecations.md#sidekiq-delivery-method-for-incoming_email-and-service_desk_email-is-deprecated) because there is a hard coupling between the delivery method and the Redis configuration. Moreover we cannot use Sidekiq framework optimizations such as job payload compression.
|
||||
The `sidekiq` delivery method adds the email body and metadata directly to the
|
||||
Redis queue that Sidekiq uses to manage jobs. It has been
|
||||
[deprecated in 16.0](../../../update/deprecations.md#sidekiq-delivery-method-for-incoming_email-and-service_desk_email-is-deprecated)
|
||||
because there is a hard coupling between the delivery method and the Redis
|
||||
configuration. Moreover we cannot use Sidekiq framework optimizations such as
|
||||
job payload compression.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
|
|
@ -134,9 +215,21 @@ end
|
|||
|
||||
### Sidekiq jobs and job payload size optimizations
|
||||
|
||||
We implemented a size limit for Sidekiq jobs and email job payloads (especially emails with attachments) are likely to pass that bar. We should experiment with the idea of handling email processing directly in the Sidekiq mailbox ingestion job. We could use an `ops` feature flag to switch between this mode and a Sidekiq job for each email.
|
||||
We implemented a size limit for Sidekiq jobs and email job payloads (especially emails with attachments)
|
||||
are likely to pass that bar. We should experiment with the idea of handling email processing
|
||||
directly in the Sidekiq mailbox ingestion job. We could use an `ops` feature flag
|
||||
to switch between this mode and a Sidekiq job for each email.
|
||||
|
||||
We'd also like to explore a solution path where we only fetch the message ids and then download the complete messages in child jobs (filter by `UID` range for example). For example we poll a mailbox and fetch a list of message ids. Then we create a new job for every 25 (or n) emails that takes the message ids or the range as an argument. These jobs will then download the entire messages and synchronously add issues or replies. If the number of emails is below 25, we could even handle the emails directly in the current job to save resources. This will allow us to eliminate the job payload size as the limiting factor for the size of emails. The disadvantage is that we need to make two calls to the IMAP server instead of one (n+1).
|
||||
We'd also like to explore a solution path where we only fetch the message ids
|
||||
and then download the complete messages in child jobs (filter by `UID` range for example).
|
||||
For example we poll a mailbox and fetch a list of message ids. Then we
|
||||
create a new job for every 25 (or n) emails that takes the message ids or the
|
||||
range as an argument. These jobs will then download the entire messages and
|
||||
synchronously add issues or replies. If the number of emails is below 25, we
|
||||
could even handle the emails directly in the current job to save resources. This
|
||||
will allow us to eliminate the job payload size as the limiting factor for the
|
||||
size of emails. The disadvantage is that we need to make two calls to the IMAP
|
||||
server instead of one (n+1).
|
||||
|
||||
## Execution plan
|
||||
|
||||
|
|
@ -158,7 +251,11 @@ We should then schedule `mail_room` for removal (GitLab 17.0 or later). This wil
|
|||
|
||||
### Do nothing
|
||||
|
||||
The current setup limits us and only allows to fetch two email addresses. To publish Service Desk custom email addresses with IMAP or API integration we would need to deliver the same architecture as described above. Because of that we should act now and include general email ingestion for `incoming_email` and `service_desk_email` first and remove the infrastructure overhead.
|
||||
The current setup limits us and only allows to fetch two email addresses. To
|
||||
publish Service Desk custom email addresses with IMAP or API integration we
|
||||
would need to deliver the same architecture as described above. Because of that
|
||||
we should act now and include general email ingestion for `incoming_email` and
|
||||
`service_desk_email` first and remove the infrastructure overhead.
|
||||
|
||||
## Additional resources
|
||||
|
||||
|
|
|
|||
|
|
@ -7,19 +7,46 @@ coach: "@grzesiek"
|
|||
|
||||
# GitLab Service-Integration: AI and Beyond
|
||||
|
||||
This document is an abbreviated proposal for Service-Integration to allow teams within GitLab to rapidly build new application features that leverage AI, ML, and data technologies.
|
||||
This document is an abbreviated proposal for Service-Integration to allow teams
|
||||
within GitLab to rapidly build new application features that leverage AI, ML,
|
||||
and data technologies.
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document proposes a service-integration approach to setting up infrastructure to allow teams within GitLab to build new application features that leverage AI, ML, and data technologies at a rapid pace. The scope of the document is limited specifically to internally hosted features, not third-party APIs. The current application architecture runs most GitLab application features in Ruby. However, many ML/AI experiments require different resources and tools, implemented in different languages, with huge libraries that do not always play nicely together, and have different hardware requirements. Adding all these features to the existing infrastructure will increase the size of the GitLab application container rapidly, resulting in slower startup times, increased number of dependencies, security risks, negatively impacting development velocity, and increasing complexity due to different hardware requirements. As an alternative, the proposal suggests adding services to avoid overloading GitLabs main workloads. These services will run independently with isolated resources and dependencies. By adding services, GitLab can maintain the availability and security of GitLab.com, and enable engineers to rapidly iterate on new ML/AI experiments.
|
||||
This document proposes a service-integration approach to setting up
|
||||
infrastructure to allow teams within GitLab to build new application features
|
||||
that leverage AI, ML, and data technologies at a rapid pace. The scope of the
|
||||
document is limited specifically to internally hosted features, not third-party
|
||||
APIs. The current application architecture runs most GitLab application features
|
||||
in Ruby. However, many ML/AI experiments require different resources and tools,
|
||||
implemented in different languages, with huge libraries that do not always play
|
||||
nicely together, and have different hardware requirements. Adding all these
|
||||
features to the existing infrastructure will increase the size of the GitLab
|
||||
application container rapidly, resulting in slower startup times, increased
|
||||
number of dependencies, security risks, negatively impacting development
|
||||
velocity, and increasing complexity due to different hardware requirements. As
|
||||
an alternative, the proposal suggests adding services to avoid overloading
|
||||
GitLabs main workloads. These services will run independently with isolated
|
||||
resources and dependencies. By adding services, GitLab can maintain the
|
||||
availability and security of GitLab.com, and enable engineers to rapidly iterate
|
||||
on new ML/AI experiments.
|
||||
|
||||
## Scope
|
||||
|
||||
The infrastructure, platform, and other changes related to ML/AI experiments is broad. This blueprint is limited specifically to the following scope:
|
||||
The infrastructure, platform, and other changes related to ML/AI experiments is
|
||||
broad. This blueprint is limited specifically to the following scope:
|
||||
|
||||
1. Production workloads, running (directly or indirectly) as a result of requests into the GitLab application (`gitlab.com`), or an associated subdomains (for example, `codesuggestions.gitlab.com`).
|
||||
1. Excludes requests from the GitLab application, made to third-party APIs outside of our infrastructure. From an Infrastructure point-of-view, external AI/ML API requests are no different from other API (non ML/AI) requests and generally follow the existing guidelines that are in place for calling external APIs.
|
||||
1. Excludes training and tuning workloads not _directly_ connected to our production workloads. Training and tuning workloads are distinct from production workloads and will be covered by their own blueprint(s).
|
||||
1. Production workloads, running (directly or indirectly) as a result of
|
||||
requests into the GitLab application (`gitlab.com`), or an associated
|
||||
subdomains (for example, `codesuggestions.gitlab.com`).
|
||||
1. Excludes requests from the GitLab application, made to third-party APIs
|
||||
outside of our infrastructure. From an Infrastructure point-of-view, external
|
||||
AI/ML API requests are no different from other API (non ML/AI) requests and
|
||||
generally follow the existing guidelines that are in place for calling
|
||||
external APIs.
|
||||
1. Excludes training and tuning workloads not _directly_ connected to our
|
||||
production workloads. Training and tuning workloads are distinct from
|
||||
production workloads and will be covered by their own blueprint(s).
|
||||
|
||||
## Running Production ML/AI experiment workloads
|
||||
|
||||
|
|
@ -27,34 +54,74 @@ The infrastructure, platform, and other changes related to ML/AI experiments is
|
|||
|
||||
Let's start with some background on how the application is deployed:
|
||||
|
||||
1. Most GitLab application features are implemented in Ruby and run in one of two types of Ruby deployments: broadly Rails and Sidekiq (although we do partition this traffic further for different workloads).
|
||||
1. These Ruby workloads have two main container images `gitlab-webservice-ee` and `gitlab-sidekiq-ee`. All the code, libraries, binaries, and other resources that we use to support the main Ruby part of the codebase are embedded within these images.
|
||||
1. There are thousands of pods running these containers in production for GitLab.com at any moment in time. They are started up and shut down at a high rate throughout the day as traffic demands on the site fluctuate.
|
||||
1. For _most_ new features developed, any new supporting resources need to be added to either one, or both of these containers.
|
||||
1. Most GitLab application features are implemented in Ruby and run in one of
|
||||
two types of Ruby deployments: broadly Rails and Sidekiq (although we do
|
||||
partition this traffic further for different workloads).
|
||||
1. These Ruby workloads have two main container images `gitlab-webservice-ee`
|
||||
and `gitlab-sidekiq-ee`. All the code, libraries, binaries, and other
|
||||
resources that we use to support the main Ruby part of the codebase are
|
||||
embedded within these images.
|
||||
1. There are thousands of pods running these containers in production for
|
||||
GitLab.com at any moment in time. They are started up and shut down at a high
|
||||
rate throughout the day as traffic demands on the site fluctuate.
|
||||
1. For _most_ new features developed, any new supporting resources need to be
|
||||
added to either one, or both of these containers.
|
||||
|
||||
\
|
||||
[source](https://docs.google.com/drawings/d/1RiTUnsDSkTGaMqK_RfUlCd_rQ6CgSInhfQJNewIKf1M/edit)
|
||||
|
||||
Many of the initial discussions focus on adding supporting resources to these existing containers ([example](https://gitlab.com/gitlab-org/gitlab/-/issues/403630#note_1345192671)). Choosing this approach would have many downsides, in terms of both the velocity at which new features can be iterated on, and in terms of the availability of GitLab.com.
|
||||
Many of the initial discussions focus on adding supporting resources to these
|
||||
existing containers ([example](https://gitlab.com/gitlab-org/gitlab/-/issues/403630#note_1345192671)).
|
||||
Choosing this approach would have many downsides, in terms of both the velocity
|
||||
at which new features can be iterated on, and in terms of the availability of
|
||||
GitLab.com.
|
||||
|
||||
Many of the AI experiments that GitLab is considering integrating into the application are substantially different from other libraries and tools that have been integrated in the past.
|
||||
Many of the AI experiments that GitLab is considering integrating into the
|
||||
application are substantially different from other libraries and tools that have
|
||||
been integrated in the past.
|
||||
|
||||
1. ML toolkits are **implemented in a plethora of languages**, each requiring separate runtimes. Python, C, C++ are the most common, but there is a long tail of languages used.
|
||||
1. There are a very large number of tools that we're looking to integrate with and **no single tool will support all the features that are being investigated**. Tensorflow, PyTorch, Keras, Scikit-learn, Alpaca are just a few examples.
|
||||
1. **These libraries are huge**. Tensorflow's container image with GPU support is 3GB, PyTorch is 5GB, Keras is 300MB. Prophet is ~250MB.
|
||||
1. Many of these **libraries do not play nicely together**: they may have dependencies that are not compatible, or require different versions of Python, or GPU driver versions.
|
||||
1. ML toolkits are **implemented in a plethora of languages**, each requiring
|
||||
separate runtimes. Python, C, C++ are the most common, but there is a long
|
||||
tail of languages used.
|
||||
1. There are a very large number of tools that we're looking to integrate with and
|
||||
**no single tool will support all the features that are being investigated**.
|
||||
Tensorflow, PyTorch, Keras, Scikit-learn, Alpaca are just a
|
||||
few examples.
|
||||
1. **These libraries are huge**. Tensorflow's container image with GPU support
|
||||
is 3GB, PyTorch is 5GB, Keras is 300MB. Prophet is ~250MB.
|
||||
1. Many of these **libraries do not play nicely together**: they may have
|
||||
dependencies that are not compatible, or require different versions of
|
||||
Python, or GPU driver versions.
|
||||
|
||||
It's likely that in the next few months, GitLab will experiment with many different features, using many different libraries.
|
||||
It's likely that in the next few months, GitLab will experiment with many
|
||||
different features, using many different libraries.
|
||||
|
||||
Trying to deploy all of these features into the existing infrastructure would have many downsides:
|
||||
Trying to deploy all of these features into the existing infrastructure would
|
||||
have many downsides:
|
||||
|
||||
1. **The size of the GitLab application container would expand very rapidly** as each new experiment introduces a new set of supporting libraries, each library is as big, or bigger, than the existing GitLab application within the container.
|
||||
1. **Startup times for new workloads would increase**, potentially impacting the availability of GitLab.com during high-traffic periods.
|
||||
1. The number of dependencies within the container would increase rapidly, putting pressure on the engineering teams to **keep ahead of exploits and vulnerabilities**.
|
||||
1. **The security attack surface within the container would be greatly increased** with each new dependency. These containers include secrets which, if leaked via an exploit would need costly application-wide secret rotation to be done.
|
||||
1. **Development velocity will be negatively impacted** as engineers work to avoid dependency conflicts between libraries.
|
||||
1. Additionally there may be **extra complexity due to different hardware requirements** for different libraries with appropriate drivers etc for GPUs, TPUs, CUDA versions, etc.
|
||||
1. Our Kubernetes workloads have been tuned for the existing multithreaded Ruby request (Rails) and message (Sidekiq) processes. Adding extremely resource-intensive applications into these workloads would affect unrelated requests, **starving requests of CPU and memory and requiring complex tuning to ensure fairness**. Failure to do this would impact our availability of GitLab.com.
|
||||
1. **The size of the GitLab application container would expand very rapidly** as
|
||||
each new experiment introduces a new set of supporting libraries, each
|
||||
library is as big, or bigger, than the existing GitLab application within the
|
||||
container.
|
||||
1. **Startup times for new workloads would increase**, potentially impacting the
|
||||
availability of GitLab.com during high-traffic periods.
|
||||
1. The number of dependencies within the container would increase rapidly,
|
||||
putting pressure on the engineering teams to **keep ahead of exploits and vulnerabilities**.
|
||||
1. **The security attack surface within the container would be greatly increased**
|
||||
with each new dependency. These containers include secrets which,
|
||||
if leaked via an exploit would need costly application-wide secret rotation
|
||||
to be done.
|
||||
1. **Development velocity will be negatively impacted** as engineers work to
|
||||
avoid dependency conflicts between libraries.
|
||||
1. Additionally there may be **extra complexity due to different hardware
|
||||
requirements** for different libraries with appropriate drivers etc for GPUs,
|
||||
TPUs, CUDA versions, etc.
|
||||
1. Our Kubernetes workloads have been tuned for the existing multithreaded Ruby
|
||||
request (Rails) and message (Sidekiq) processes. Adding extremely
|
||||
resource-intensive applications into these workloads would affect unrelated
|
||||
requests, **starving requests of CPU and memory and requiring complex tuning
|
||||
to ensure fairness**. Failure to do this would impact our availability of
|
||||
GitLab.com.
|
||||
|
||||

|
||||
\
|
||||
|
|
@ -62,46 +129,93 @@ Trying to deploy all of these features into the existing infrastructure would ha
|
|||
|
||||
### Proposal: Avoid Overfilling GitLabs Application Containers with Service-Integration
|
||||
|
||||
GitLab.com migrated to Kubernetes several years back, but for numerous good reasons, the application architecture deployed for GitLab.com remains fairly simple.
|
||||
GitLab.com migrated to Kubernetes several years back, but for numerous good
|
||||
reasons, the application architecture deployed for GitLab.com remains fairly
|
||||
simple.
|
||||
|
||||
Instead of embedding these applications directly into the Rails and/or Sidekiq containers, we run them as small, independent Kubernetes deployments, isolated from the main workload.
|
||||
Instead of embedding these applications directly into the Rails and/or Sidekiq
|
||||
containers, we run them as small, independent Kubernetes deployments, isolated
|
||||
from the main workload.
|
||||
|
||||
\
|
||||
[source](https://docs.google.com/drawings/d/1ZPprcSYH5Oqp8T46I0p1Hhr-GD55iREDvFWcpQq9dTQ/edit)
|
||||
|
||||
The service-integration approach has already been used for the [GitLab Duo Suggested Reviewers feature](https://gitlab.com/gitlab-com/gl-infra/readiness/-/merge_requests/114) that has been deployed to GitLab.com.
|
||||
The service-integration approach has already been used for the
|
||||
[GitLab Duo Suggested Reviewers feature](https://gitlab.com/gitlab-com/gl-infra/readiness/-/merge_requests/114)
|
||||
that has been deployed to GitLab.com.
|
||||
|
||||
This approach would have many advantages:
|
||||
|
||||
1. **Componentization and Replaceability**: some of these AI feature experiments will likely be short-lived. Being able to shut them down (possibly quickly, in an emergency, such as a security breach) is important. If they are terminated, they are less likely to leave technical debt behind in our main application workloads.
|
||||
1. **Security Isolation**: experimental services can run with access to a minimal set of secrets, or possibly none. Ideally, the services would be stateless, with data being passed in, processed, and returned to the caller without access to PostgreSQL or other data sources. In the event of a remote code exploit or other security breach, the attacker would have limited access to sensitive data.
|
||||
1. In lieu of direct access to the main or CI Postgres clusters, services would be provided with access to the internal GitLab API through a predefined internal URL. The platform should provide instrumentation and monitoring on this address.
|
||||
1. In future iterations, but out of scope for the initial delivery, the platform could facilitate automatic authentication against the internal API, for example by managing and injecting short-lived API tokens into internal API calls, or OIDC etc.
|
||||
1. **Resource Isolation**: resource-intensive workloads would be isolated to individual containers. OOM failures would not impact requests outside of the experiment. CPU saturation would not slow down unrelated requests.
|
||||
1. **Dependency Isolation**: different AI libraries will have conflicting dependencies. This will not be an issue if they're run as separate services in Kubernetes.
|
||||
1. **Container Size**: the size of the main application containers is not drastically increased, placing a burden on the application.
|
||||
1. **Distribution Team Bottleneck**: The Distribution team avoids becoming a bottleneck as demands for many different libraries to be included in the main application containers increase.
|
||||
1. **Stronger Ownership of Workloads**: teams can better understand how their workloads are running as they run in isolation.
|
||||
1. **Componentization and Replaceability**: some of these AI feature experiments
|
||||
will likely be short-lived. Being able to shut them down (possibly quickly,
|
||||
in an emergency, such as a security breach) is important. If they are
|
||||
terminated, they are less likely to leave technical debt behind in our main
|
||||
application workloads.
|
||||
1. **Security Isolation**: experimental services can run with access to a
|
||||
minimal set of secrets, or possibly none. Ideally, the services would be
|
||||
stateless, with data being passed in, processed, and returned to the caller
|
||||
without access to PostgreSQL or other data sources. In the event of a remote
|
||||
code exploit or other security breach, the attacker would have limited access
|
||||
to sensitive data.
|
||||
1. In lieu of direct access to the main or CI Postgres clusters, services
|
||||
would be provided with access to the internal GitLab API through a
|
||||
predefined internal URL. The platform should provide instrumentation and
|
||||
monitoring on this address.
|
||||
1. In future iterations, but out of scope for the initial delivery, the
|
||||
platform could facilitate automatic authentication against the internal
|
||||
API, for example by managing and injecting short-lived API tokens into
|
||||
internal API calls, or OIDC etc.
|
||||
1. **Resource Isolation**: resource-intensive workloads would be isolated to
|
||||
individual containers. OOM failures would not impact requests outside of the
|
||||
experiment. CPU saturation would not slow down unrelated requests.
|
||||
1. **Dependency Isolation**: different AI libraries will have conflicting
|
||||
dependencies. This will not be an issue if they're run as separate services
|
||||
in Kubernetes.
|
||||
1. **Container Size**: the size of the main application containers is not
|
||||
drastically increased, placing a burden on the application.
|
||||
1. **Distribution Team Bottleneck**: The Distribution team avoids becoming a
|
||||
bottleneck as demands for many different libraries to be included in the main
|
||||
application containers increase.
|
||||
1. **Stronger Ownership of Workloads**: teams can better understand how their
|
||||
workloads are running as they run in isolation.
|
||||
|
||||
However, there are several outstanding questions:
|
||||
|
||||
1. **Availability Requirements**: would experimental services have the same availability requirements (and alerting requirements) as the main application?
|
||||
1. **Oncall**: would teams be responsible for handling pager alerts for their services?
|
||||
1. **Support for non-SAAS GitLab instances**: initially all experiments would target GitLab.com, but eventually we may need to consider how to support other instances.
|
||||
1. **Availability Requirements**: would experimental services have the same
|
||||
availability requirements (and alerting requirements) as the main
|
||||
application?
|
||||
1. **Oncall**: would teams be responsible for handling pager alerts for their
|
||||
services?
|
||||
1. **Support for non-SAAS GitLab instances**: initially all experiments would
|
||||
target GitLab.com, but eventually we may need to consider how to support
|
||||
other instances.
|
||||
1. There are three possible modes for services:
|
||||
1. `M1`: GitLab.com only: only GitLab.com supports the service.
|
||||
1. `M2`: SAAS-hosted for use with self-managed instance and instance-hosted: a singular SAAS-hosted service supports self-managed instances and GitLab.com. This is similar to the [GitLab Plus proposal](https://gitlab.com/groups/gitlab-org/-/epics/308).
|
||||
1. `M3`: Instance-hosted: each instance has a copy of the service. GitLab.com has a copy for GitLab.com. Self-managed instances host their copy of the service. This is similar to the container registry or Gitaly today.
|
||||
1. Initially, most experiments will probably be option 1 but may be promoted to 2 or 3 as they mature.
|
||||
1. **Promotion Process**: ML/AI experimental features will need to be promoted to non-experimental status as they mature. A process for this will need to be established.
|
||||
1. `M2`: SAAS-hosted for use with self-managed instance and
|
||||
instance-hosted: a singular SAAS-hosted service supports self-managed
|
||||
instances and GitLab.com. This is similar to the [GitLab Plus proposal](https://gitlab.com/groups/gitlab-org/-/epics/308).
|
||||
1. `M3`: Instance-hosted: each instance has a copy of the service.
|
||||
GitLab.com has a copy for GitLab.com. Self-managed instances host their
|
||||
copy of the service. This is similar to the container registry or
|
||||
Gitaly today.
|
||||
1. Initially, most experiments will probably be option 1 but may be promoted
|
||||
to 2 or 3 as they mature.
|
||||
1. **Promotion Process**: ML/AI experimental features will need to be promoted
|
||||
to non-experimental status as they mature. A process for this will need to be
|
||||
established.
|
||||
|
||||
#### Proposed Guidelines for Building ML/AI Services
|
||||
|
||||
1. Avoid adding any large ML/AI libraries needed to support experimentation to the main application.
|
||||
1. Avoid adding any large ML/AI libraries needed to support experimentation to
|
||||
the main application.
|
||||
1. Create an platform to support individual ML/AI experiments.
|
||||
1. Encourage supporting services to be stateless (excluding deployed models and other resources generated during ML training).
|
||||
1. ML/AI experiment support services must not access main application datastores, including but not limited to main PostgreSQL, CI PostgreSQL, and main application Redis instances.
|
||||
1. In the main application, client code for services should reside behind a feature-flag toggle, for fine-grained control of the feature.
|
||||
1. Encourage supporting services to be stateless (excluding deployed models and
|
||||
other resources generated during ML training).
|
||||
1. ML/AI experiment support services must not access main application
|
||||
datastores, including but not limited to main PostgreSQL, CI PostgreSQL, and
|
||||
main application Redis instances.
|
||||
1. In the main application, client code for services should reside behind a
|
||||
feature-flag toggle, for fine-grained control of the feature.
|
||||
|
||||
#### Technical Details
|
||||
|
||||
|
|
@ -114,9 +228,13 @@ Some points, in greater detail:
|
|||
|
||||
##### Platform Requirements
|
||||
|
||||
In order to quickly deploy and manage experiments, an minimally viable platform will need to be provided to stage-group teams. The technical implementation details of this platform are out of scope for this blueprint and will require their own blueprint (to follow).
|
||||
In order to quickly deploy and manage experiments, an minimally viable platform
|
||||
will need to be provided to stage-group teams. The technical implementation
|
||||
details of this platform are out of scope for this blueprint and will require
|
||||
their own blueprint (to follow).
|
||||
|
||||
However, Service-Integration will establish certain necessary and optional requirements that the platform will need to satisfy.
|
||||
However, Service-Integration will establish certain necessary and optional
|
||||
requirements that the platform will need to satisfy.
|
||||
|
||||
###### Ease of Use, Ownership Requirements
|
||||
|
||||
|
|
|
|||
|
|
@ -12,15 +12,26 @@ participating-stages: ["~devops::ai-powered", "~devops::create"]
|
|||
|
||||
## Goals
|
||||
|
||||
The goal of this blueprint is to describe viable options for RAG at GitLab across deployment types. The aim is to describe RAG implementations that provide our AI features–and by extension our customers–with best-in-class user experiences.
|
||||
The goal of this blueprint is to describe viable options for RAG at GitLab
|
||||
across deployment types. The aim is to describe RAG implementations that provide
|
||||
our AI features–and by extension our customers–with best-in-class user
|
||||
experiences.
|
||||
|
||||
## Overview of RAG
|
||||
|
||||
RAG, or Retrieval Augmented Generation, involves several key process blocks:
|
||||
|
||||
- **Input Transformation**: This step involves processing the user's input, which can vary from natural language text to JSON or keywords. For effective query construction, we might utilize Large Language Models (LLMs) to format the input into a standard expected format or to extract specific keywords.
|
||||
- **Retrieval**: Here, we fetch relevant data from specified data sources, which may include diverse storage engines like vector, graph, or relational databases. It's crucial to conduct [data access checks](#data-access-policy) during this phase. After retrieval, the data should be optimized for LLMs through post-processing to enhance the quality of the generated responses.
|
||||
- **Generation**: This phase involves crafting a prompt with the retrieved data and submitting it to an LLM, which then generates an AI-powered response.
|
||||
- **Input Transformation**: This step involves processing the user's input,
|
||||
which can vary from natural language text to JSON or keywords. For effective
|
||||
query construction, we might utilize Large Language Models (LLMs) to format
|
||||
the input into a standard expected format or to extract specific keywords.
|
||||
- **Retrieval**: Here, we fetch relevant data from specified data sources, which
|
||||
may include diverse storage engines like vector, graph, or relational
|
||||
databases. It's crucial to conduct [data access checks](#data-access-policy)
|
||||
during this phase. After retrieval, the data should be optimized for LLMs
|
||||
through post-processing to enhance the quality of the generated responses.
|
||||
- **Generation**: This phase involves crafting a prompt with the retrieved data
|
||||
and submitting it to an LLM, which then generates an AI-powered response.
|
||||
|
||||

|
||||
|
||||
|
|
@ -30,14 +41,32 @@ RAG, or Retrieval Augmented Generation, involves several key process blocks:
|
|||
|
||||
### Data for LLMs
|
||||
|
||||
Ensuring data is optimized for LLMs is crucial for consistently generating high-quality AI responses. Several challenges exist when providing context to LLMs:
|
||||
Ensuring data is optimized for LLMs is crucial for consistently generating
|
||||
high-quality AI responses. Several challenges exist when providing context to
|
||||
LLMs:
|
||||
|
||||
- **Long Contexts:** Extensive contexts can degrade LLM performance, a phenomenon known as the Lost in the Middle problem. Employing Rerankers can enhance performance but may also increase computational costs due to longer processing times.
|
||||
- **Duplicate Contents:** Repetitive content can reduce the diversity of search results. For instance, if a semantic search yields ten results indicating "Tom is a president" but the eleventh reveals "Tom lives in the United States," solely using the top ten would omit critical information. Filtering out duplicate content, for example, through Maximal Marginal Relevance (MMR), can mitigate this issue.
|
||||
- **Conflicting Information:** Retrieving conflicting data from multiple sources can lead to LLM "hallucinations." For example, mixing sources that define "RAG" differently can confuse the LLM. Careful source selection and content curation are essential.
|
||||
- **Irrelevant Content:** Including irrelevant data can negatively impact LLM performance. Setting a threshold for relevance scores or considering that certain irrelevant contents might actually enhance output quality are strategies to address this challenge.
|
||||
- **Long Contexts:** Extensive contexts can degrade LLM performance, a
|
||||
phenomenon known as the Lost in the Middle problem. Employing Rerankers can
|
||||
enhance performance but may also increase computational costs due to longer
|
||||
processing times.
|
||||
- **Duplicate Contents:** Repetitive content can reduce the diversity of search
|
||||
results. For instance, if a semantic search yields ten results indicating
|
||||
"Tom is a president" but the eleventh reveals "Tom lives in the United States,"
|
||||
solely using the top ten would omit critical information. Filtering out
|
||||
duplicate content, for example, through Maximal Marginal Relevance (MMR), can
|
||||
mitigate this issue.
|
||||
- **Conflicting Information:** Retrieving conflicting data from multiple sources
|
||||
can lead to LLM "hallucinations." For example, mixing sources that define
|
||||
"RAG" differently can confuse the LLM. Careful source selection and content
|
||||
curation are essential.
|
||||
- **Irrelevant Content:** Including irrelevant data can negatively impact LLM
|
||||
performance. Setting a threshold for relevance scores or considering that
|
||||
certain irrelevant contents might actually enhance output quality are
|
||||
strategies to address this challenge.
|
||||
|
||||
It's highly recommended to evaluate the optimal data format and size for maximizing LLM performance, as the effects on performance and result quality can vary significantly based on the data's structure.
|
||||
It's highly recommended to evaluate the optimal data format and size for
|
||||
maximizing LLM performance, as the effects on performance and result quality can
|
||||
vary significantly based on the data's structure.
|
||||
|
||||
References:
|
||||
|
||||
|
|
@ -46,7 +75,13 @@ References:
|
|||
|
||||
#### Regenerating Embeddings
|
||||
|
||||
The AI field is evolving rapidly and new models and approaches seem to appear daily that could improve our users' experience, we want to conscious of model switching costs. If we decide to swap models or change our chunking strategy (as two examples), we will need to wipe our existing embeddings and do a full replacement with embeddings from the new model or with the new text chunks, etc. Factors to consider which could trigger the need for a full regeneration of embeddings for the affected data include:
|
||||
The AI field is evolving rapidly and new models and approaches seem to appear
|
||||
daily that could improve our users' experience, we want to conscious of model
|
||||
switching costs. If we decide to swap models or change our chunking strategy (as two examples),
|
||||
we will need to wipe our existing embeddings and do a full
|
||||
replacement with embeddings from the new model or with the new text chunks, etc.
|
||||
Factors to consider which could trigger the need for a full regeneration of
|
||||
embeddings for the affected data include:
|
||||
|
||||
- A change in the optimal text chunk size
|
||||
- A change in a preprocessing step which perhaps adds new fields to a text chunk
|
||||
|
|
@ -55,7 +90,12 @@ The AI field is evolving rapidly and new models and approaches seem to appear da
|
|||
|
||||
### Multi-source Retrieval
|
||||
|
||||
Addressing complex queries may require data from multiple sources. For instance, queries linking issues to merge requests necessitate fetching details from both. GitLab Duo Chat, utilizing the [ReACT framework](https://arxiv.org/abs/2210.03629), sequentially retrieves data from PostgreSQL tables, which can prolong the retrieval process due to the sequential execution of multiple tools and LLM inferences.
|
||||
Addressing complex queries may require data from multiple sources. For instance,
|
||||
queries linking issues to merge requests necessitate fetching details from both.
|
||||
GitLab Duo Chat, utilizing the
|
||||
[ReACT framework](https://arxiv.org/abs/2210.03629), sequentially retrieves data from
|
||||
PostgreSQL tables, which can prolong the retrieval process due to the sequential
|
||||
execution of multiple tools and LLM inferences.
|
||||
|
||||
## Searching for Data
|
||||
|
||||
|
|
@ -63,60 +103,131 @@ Choosing the appropriate search method is pivotal for feature design and UX opti
|
|||
|
||||
### Semantic Search
|
||||
|
||||
Semantic search shines when handling complex queries that demand an understanding of the context or intent behind the words, not just the words themselves. It's particularly effective for queries expressed in natural language, such as full sentences or questions, where the overall meaning outweighs the importance of specific keywords. Semantic search excels at providing thorough coverage of a topic, capturing related concepts that may not be directly mentioned in the query, thus uncovering more nuanced or indirectly related information.
|
||||
Semantic search shines when handling complex queries that demand an
|
||||
understanding of the context or intent behind the words, not just the words
|
||||
themselves. It's particularly effective for queries expressed in natural
|
||||
language, such as full sentences or questions, where the overall meaning
|
||||
outweighs the importance of specific keywords. Semantic search excels at
|
||||
providing thorough coverage of a topic, capturing related concepts that may not
|
||||
be directly mentioned in the query, thus uncovering more nuanced or indirectly
|
||||
related information.
|
||||
|
||||
In the realm of semantic search, the K-Nearest Neighbors (KNN) method is commonly employed to identify data segments that are semantically closer to the user's input. To measure the semantic proximity, various methods are used:
|
||||
In the realm of semantic search, the K-Nearest Neighbors (KNN) method is
|
||||
commonly employed to identify data segments that are semantically closer to the
|
||||
user's input. To measure the semantic proximity, various methods are used:
|
||||
|
||||
- **Cosine Similarity:** Focuses solely on the direction of vectors.
|
||||
- **L2 Distance (Euclidean Distance):** Takes into account both the direction and magnitude of vectors.
|
||||
These vectors, known as "embeddings," are created by processing the data source through an embedding model. Currently, in GitLab production, we utilize the `textembedding-gecko` model provided by Vertex AI. However, there might be scenarios where you consider using alternative embedding models, such as those available on HuggingFace, to reduce costs. Opting for different models requires comprehensive evaluation and consultation, particularly with the legal team, to ensure the chosen model's usage complies with GitLab policies. See the [Security, Legal, and Compliance](https://gitlab.com/gitlab-org/gitlab/-/blob/52f4fcb033d13f3d909a777728ba8f3fa2c93256/doc/architecture/blueprints/gitlab_duo_rag/index.md#security-legal-and-compliance) section for more details. It's also important to note that multilingual support can vary significantly across different embedding models, and switching models may lead to regressions.
|
||||
- **L2 Distance (Euclidean Distance):** Takes into account both the direction
|
||||
and magnitude of vectors. These vectors, known as "embeddings," are created by
|
||||
processing the data source through an embedding model. Currently, in GitLab
|
||||
production, we utilize the `textembedding-gecko` model provided by Vertex AI.
|
||||
However, there might be scenarios where you consider using alternative embedding
|
||||
models, such as those available on HuggingFace, to reduce costs. Opting for
|
||||
different models requires comprehensive evaluation and consultation,
|
||||
particularly with the legal team, to ensure the chosen model's usage complies
|
||||
with GitLab policies. See the
|
||||
[Security, Legal, and Compliance](https://gitlab.com/gitlab-org/gitlab/-/blob/52f4fcb033d13f3d909a777728ba8f3fa2c93256/doc/architecture/blueprints/gitlab_duo_rag/index.md#security-legal-and-compliance)
|
||||
section for more details. It's also important to note that multilingual support
|
||||
can vary significantly across different embedding models, and switching models
|
||||
may lead to regressions.
|
||||
|
||||
For large datasets, it's advisable to implement indexes to enhance query performance. The HNSW (Hierarchical Navigable Small World) method, combined with approximate nearest neighbors (ANN) search, is a popular strategy for this purpose. For insights into HNSW's effectiveness, consider reviewing [benchmarks on its performance in large-scale applications](https://supabase.com/blog/increase-performance-pgvector-hnsw).
|
||||
For large datasets, it's advisable to implement indexes to enhance query
|
||||
performance. The HNSW (Hierarchical Navigable Small World) method, combined with
|
||||
approximate nearest neighbors (ANN) search, is a popular strategy for this
|
||||
purpose. For insights into HNSW's effectiveness, consider reviewing
|
||||
[benchmarks on its performance in large-scale applications](https://supabase.com/blog/increase-performance-pgvector-hnsw).
|
||||
|
||||
### Keyword Search
|
||||
|
||||
Keyword search is the go-to method for straightforward, specific queries where users are clear about their search intent and can provide precise terms or phrases. This method is highly effective for retrieving exact matches, making it suitable for searches within structured databases or when looking for specific documents, terms, or phrases.
|
||||
Keyword search is the go-to method for straightforward, specific queries where
|
||||
users are clear about their search intent and can provide precise terms or
|
||||
phrases. This method is highly effective for retrieving exact matches, making it
|
||||
suitable for searches within structured databases or when looking for specific
|
||||
documents, terms, or phrases.
|
||||
|
||||
Keyword search operates on the principle of matching the query terms directly with the content in the database or document collection, prioritizing results that have a high frequency of the query terms. Its efficiency and directness make it particularly useful for situations where users expect quick and precise results based on specific keywords or phrases.
|
||||
Keyword search operates on the principle of matching the query terms directly
|
||||
with the content in the database or document collection, prioritizing results
|
||||
that have a high frequency of the query terms. Its efficiency and directness
|
||||
make it particularly useful for situations where users expect quick and precise
|
||||
results based on specific keywords or phrases.
|
||||
|
||||
### Hybrid Search
|
||||
|
||||
Hybrid search combines the depth of semantic search with the precision of keyword search, offering a comprehensive search solution that caters to both context-rich and specific queries. By running both semantic and keyword searches simultaneously, it integrates the strengths of both methods—semantic search's ability to understand the context and keyword search's precision in identifying exact matches.
|
||||
Hybrid search combines the depth of semantic search with the precision of
|
||||
keyword search, offering a comprehensive search solution that caters to both
|
||||
context-rich and specific queries. By running both semantic and keyword searches
|
||||
simultaneously, it integrates the strengths of both methods—semantic search's
|
||||
ability to understand the context and keyword search's precision in identifying
|
||||
exact matches.
|
||||
|
||||
The results from both searches are then combined, with their relevance scores normalized to provide a unified set of results. This approach is particularly effective in scenarios where queries may not be fully served by either method alone, offering a balanced and nuanced response to complex search needs. The computational demands of kNN searches, which are part of semantic search, are contrasted with the relative efficiency of [BM25](https://pub.aimind.so/understanding-the-bm25-ranking-algorithm-19f6d45c6ce) keyword searches, making hybrid search a strategic choice for optimizing performance across diverse datasets.
|
||||
The results from both searches are then combined, with their relevance scores
|
||||
normalized to provide a unified set of results. This approach is particularly
|
||||
effective in scenarios where queries may not be fully served by either method
|
||||
alone, offering a balanced and nuanced response to complex search needs. The
|
||||
computational demands of kNN searches, which are part of semantic search, are
|
||||
contrasted with the relative efficiency of [BM25](https://pub.aimind.so/understanding-the-bm25-ranking-algorithm-19f6d45c6ce)
|
||||
keyword searches, making hybrid search a strategic choice for optimizing
|
||||
performance across diverse datasets.
|
||||
|
||||
### Code Search
|
||||
|
||||
Like the other data types above, a source code search task can utilize different search types, each more suited to address different queries. Currently, [Zoekt](../code_search_with_zoekt/index.md) is employed on GitLab.com to provide exact match keyword search and regular expression search capabilities for source code. Semantic search and hybrid search functionalities are yet to be implemented for code.
|
||||
Like the other data types above, a source code search task can utilize different
|
||||
search types, each more suited to address different queries. Currently,
|
||||
[Zoekt](../code_search_with_zoekt/index.md) is employed on GitLab.com to provide
|
||||
exact match keyword search and regular expression search capabilities for source
|
||||
code. Semantic search and hybrid search functionalities are yet to be
|
||||
implemented for code.
|
||||
|
||||
### ID Search
|
||||
|
||||
Facilitates data retrieval using specific resource IDs, such as issue links. For example retrieving data from the specified resource ID, such as an Issue link or a shortcut. See [ID search](postgresql.md#id-search) for more information.
|
||||
Facilitates data retrieval using specific resource IDs, such as issue links. For
|
||||
example retrieving data from the specified resource ID, such as an Issue link or
|
||||
a shortcut. See [ID search](postgresql.md#id-search) for more information.
|
||||
|
||||
### Knowledge Graph
|
||||
|
||||
Knowledge Graph search transcends the limitations of traditional search methods by leveraging the interconnected nature of data represented in graph form. Unlike semantic search, which focuses on content similarity, Knowledge Graph search understands and utilizes the relationships between different data points, providing a rich, contextual exploration of data.
|
||||
Knowledge Graph search transcends the limitations of traditional search methods
|
||||
by leveraging the interconnected nature of data represented in graph form.
|
||||
Unlike semantic search, which focuses on content similarity, Knowledge Graph
|
||||
search understands and utilizes the relationships between different data points,
|
||||
providing a rich, contextual exploration of data.
|
||||
|
||||
This approach is ideal for queries that benefit from understanding the broader context or the interconnectedness of data entities. Graph databases store relationships alongside the data, enabling complex queries that can navigate these connections to retrieve highly contextual and nuanced information.
|
||||
This approach is ideal for queries that benefit from understanding the broader
|
||||
context or the interconnectedness of data entities. Graph databases store
|
||||
relationships alongside the data, enabling complex queries that can navigate
|
||||
these connections to retrieve highly contextual and nuanced information.
|
||||
|
||||
Knowledge Graphs are particularly useful in scenarios requiring deep insight into the relationships between entities, such as recommendation systems, complex data analysis, and semantic querying, offering a dynamic way to explore and understand large, interconnected datasets.
|
||||
Knowledge Graphs are particularly useful in scenarios requiring deep insight
|
||||
into the relationships between entities, such as recommendation systems, complex
|
||||
data analysis, and semantic querying, offering a dynamic way to explore and
|
||||
understand large, interconnected datasets.
|
||||
|
||||
## Security, Legal and Compliance
|
||||
|
||||
### Data access policy
|
||||
|
||||
The retrieval process must comply with the [GitLab Data Classification Standard](https://handbook.gitlab.com/handbook/security/data-classification-standard/). If the user doesn't have access to the data, GitLab will not fetch the data for building a prompt.
|
||||
The retrieval process must comply with the
|
||||
[GitLab Data Classification Standard](https://handbook.gitlab.com/handbook/security/data-classification-standard/).
|
||||
If the user doesn't have access to the data, GitLab will not fetch the data for
|
||||
building a prompt.
|
||||
|
||||
For example:
|
||||
|
||||
- When the data is GitLab Documentation (GREEN level), the data can be fetched without authorizations.
|
||||
- When the data is customer data such as issues, merge requests, etc (RED level), the data must be fetched with proper authorizations based on permissions and roles.
|
||||
- When the data is GitLab Documentation (GREEN level), the data can be fetched
|
||||
without authorizations.
|
||||
- When the data is customer data such as issues, merge requests, etc (RED level),
|
||||
the data must be fetched with proper authorizations based on permissions and roles.
|
||||
|
||||
If you're proposing to fetch data from an external public database (e.g. fetching data from `arxiv.org` so the LLM can answer questions about quantitative biology), please conduct a thorough review to ensure the external data isn't inappropriate for GitLab to process.
|
||||
If you're proposing to fetch data from an external public database
|
||||
(e.g. fetching data from `arxiv.org` so the LLM can answer questions about
|
||||
quantitative biology), please conduct a thorough review to ensure the external
|
||||
data isn't inappropriate for GitLab to process.
|
||||
|
||||
### Data usage
|
||||
|
||||
Using a new embedding model or persisting data into a new storage would require [legal reviews](https://handbook.gitlab.com/handbook/legal/). See the following links for more information:
|
||||
Using a new embedding model or persisting data into a new storage would require
|
||||
[legal reviews](https://handbook.gitlab.com/handbook/legal/). See the following
|
||||
links for more information:
|
||||
|
||||
- [Data privacy](../../../user/gitlab_duo/data_usage.md#data-privacy)
|
||||
- [Data retention](../../../user/gitlab_duo/data_usage.md#data-retention)
|
||||
|
|
@ -124,11 +235,18 @@ Using a new embedding model or persisting data into a new storage would require
|
|||
|
||||
## Evaluation
|
||||
|
||||
Evaluation is a crucial step in objectively determining the quality of the retrieval process. Tailoring the retrieval process based on specific user feedback can lead to biased optimizations, potentially causing regressions for other users. It's essential to have a dedicated test dataset and tools for a comprehensive quality assessment. For assistance with AI evaluation, please reach out to the [AI Model Validation Group](https://handbook.gitlab.com/handbook/engineering/development/data-science/model-validation/).
|
||||
Evaluation is a crucial step in objectively determining the quality of the
|
||||
retrieval process. Tailoring the retrieval process based on specific user
|
||||
feedback can lead to biased optimizations, potentially causing regressions for
|
||||
other users. It's essential to have a dedicated test dataset and tools for a
|
||||
comprehensive quality assessment. For assistance with AI evaluation, please
|
||||
reach out to the [AI Model Validation Group](https://handbook.gitlab.com/handbook/engineering/development/data-science/model-validation/).
|
||||
|
||||
## Before Implementing RAG
|
||||
|
||||
Before integrating Retrieval Augmented Generation (RAG) into your system, it's important to evaluate whether it enhances the quality of AI-generated responses. Consider these essential questions:
|
||||
Before integrating Retrieval Augmented Generation (RAG) into your system, it's
|
||||
important to evaluate whether it enhances the quality of AI-generated responses.
|
||||
Consider these essential questions:
|
||||
|
||||
- **What does typical user input look like?**
|
||||
- For instance, "Which class should we use to make an external HTTP request in this repository?"
|
||||
|
|
@ -141,11 +259,20 @@ Before integrating Retrieval Augmented Generation (RAG) into your system, it's i
|
|||
- **Consider the current search method used for similar tasks**. (Ask yourself: How would I currently search for this data with the tools at my disposal?)
|
||||
- Example: Navigate to the code search page and look for occurrences of "http."
|
||||
- **Have you successfully generated the desired AI response with sample data?** Experiment in a third-party prompt playground or Google Colab to test.
|
||||
- **If contemplating semantic search**, it's **highly recommended** that you develop a prototype first to ensure it meets your specific retrieval needs. Semantic search may interpret queries differently than expected, especially when the data source lacks natural language context, such as uncommented code. In such cases, semantic search might not perform as well as traditional keyword search methods. Here's [an example prototype](https://colab.research.google.com/drive/1K1gf6FibV-cjlXvTJPboQJtjYcSsyYi2?usp=sharing) that demonstrates semantic search for CI job configurations.
|
||||
- **If contemplating semantic search**, it's **highly recommended** that you
|
||||
develop a prototype first to ensure it meets your specific retrieval needs.
|
||||
Semantic search may interpret queries differently than expected, especially
|
||||
when the data source lacks natural language context, such as uncommented code.
|
||||
In such cases, semantic search might not perform as well as traditional
|
||||
keyword search methods. Here's [an example prototype](https://colab.research.google.com/drive/1K1gf6FibV-cjlXvTJPboQJtjYcSsyYi2?usp=sharing)
|
||||
that demonstrates semantic search for CI job configurations.
|
||||
|
||||
## Evaluated Solutions
|
||||
|
||||
The following solutions have been validated with PoCs to ensure they meet the basic requirements of vector storage and retrieval for GitLab Duo Chat with GitLab documentation. Click the links to learn more about each solutions attributes that relate to RAG:
|
||||
The following solutions have been validated with PoCs to ensure they meet the
|
||||
basic requirements of vector storage and retrieval for GitLab Duo Chat with
|
||||
GitLab documentation. Click the links to learn more about each solutions
|
||||
attributes that relate to RAG:
|
||||
|
||||
- [PostgreSQL with PGVector](postgresql.md)
|
||||
- [Elasticsearch](elasticsearch.md)
|
||||
|
|
@ -161,6 +288,12 @@ To read more about the [GitLab Duo Chat PoCs](../gitlab_duo_rag/index.md) conduc
|
|||
|
||||
_Disclaimer: This blueprint is in the first iteration and the chosen solutions could change._
|
||||
|
||||
Due to the existing framework and scalability of Elasticsearch, embeddings will be stored on Elasticsearch for large datasets such as [issues](https://gitlab.com/gitlab-org/gitlab/-/issues/451431), merge requests, etc. This will be used to perform [Hybrid Search](https://gitlab.com/gitlab-org/gitlab/-/issues/440424) but will also be useful for other features such as finding duplicates, similar results or categorizing documents.
|
||||
Due to the existing framework and scalability of Elasticsearch, embeddings will
|
||||
be stored on Elasticsearch for large datasets such as
|
||||
[issues](https://gitlab.com/gitlab-org/gitlab/-/issues/451431), merge requests,
|
||||
etc. This will be used to perform [Hybrid Search](https://gitlab.com/gitlab-org/gitlab/-/issues/440424)
|
||||
but will also be useful for other features such as finding duplicates, similar results or
|
||||
categorizing documents.
|
||||
|
||||
[Vertext AI Search](../gitlab_duo_rag/vertex_ai_search.md) is going to be implemented to serve GitLab DUO documentation for self-managed instances.
|
||||
[Vertext AI Search](../gitlab_duo_rag/vertex_ai_search.md) is going to be
|
||||
implemented to serve GitLab DUO documentation for self-managed instances.
|
||||
|
|
|
|||
|
|
@ -14,15 +14,30 @@ participating-stages: []
|
|||
|
||||
GitLab and Google Cloud have recently [announced](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) a partnership to combine the unique capabilities of their platforms.
|
||||
|
||||
As highlighted in the announcement, one key goal is the ability to "_use Google's Artifact Registry with GitLab pipelines and packaging to create a security data plane_". The initial step toward this goal is to allow users to configure a new [Google Artifact Registry](https://cloud.google.com/artifact-registry) (abbreviated as GAR from now on) [project integration](../../../user/project/integrations/index.md) and display [container image artifacts](https://cloud.google.com/artifact-registry/docs/supported-formats) in the GitLab UI.
|
||||
As highlighted in the announcement, one key goal is the ability to
|
||||
"_use Google's Artifact Registry with GitLab pipelines and packaging to create a security data plane_".
|
||||
The initial step toward this goal is to allow users to configure a new
|
||||
[Google Artifact Registry](https://cloud.google.com/artifact-registry) (abbreviated as GAR from now on)
|
||||
[project integration](../../../user/project/integrations/index.md) and display
|
||||
[container image artifacts](https://cloud.google.com/artifact-registry/docs/supported-formats)
|
||||
in the GitLab UI.
|
||||
|
||||
## Motivation
|
||||
|
||||
Refer to the [announcement](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/) blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.
|
||||
Refer to the [announcement](https://about.gitlab.com/blog/2023/08/29/gitlab-google-partnership-s3c/)
|
||||
blog post for more details about the motivation and long-term goals of the GitLab and Google Cloud partnership.
|
||||
|
||||
Regarding the scope of this design document, our primary focus is to fulfill the Product requirement of providing users with visibility over their container images in GAR. The motivation for this specific goal is rooted in foundational research on the use of external registries as a complement to the GitLab container registry ([internal](https://gitlab.com/gitlab-org/ux-research/-/issues/2602)).
|
||||
Regarding the scope of this design document, our primary focus is to fulfill the
|
||||
Product requirement of providing users with visibility over their container
|
||||
images in GAR. The motivation for this specific goal is rooted in foundational
|
||||
research on the use of external registries as a complement to the GitLab
|
||||
container registry ([internal](https://gitlab.com/gitlab-org/ux-research/-/issues/2602)).
|
||||
|
||||
Since this marks the first step in the GAR integration, our aim is to achieve this goal in a way that establishes a foundation to facilitate reusability in the future. This groundwork could benefit potential future expansions, such as support for additional artifact formats (npm, Maven, etc.), and features beyond the Package stage (e.g., vulnerability scanning, deployments, etc.).
|
||||
Since this marks the first step in the GAR integration, our aim is to achieve
|
||||
this goal in a way that establishes a foundation to facilitate reusability in
|
||||
the future. This groundwork could benefit potential future expansions, such as
|
||||
support for additional artifact formats (npm, Maven, etc.), and features beyond
|
||||
the Package stage (e.g., vulnerability scanning, deployments, etc.).
|
||||
|
||||
### Goals
|
||||
|
||||
|
|
@ -74,21 +89,54 @@ As previously highlighted, access to the GAR integration features is restricted
|
|||
|
||||
#### Resource Mapping
|
||||
|
||||
For the [GitLab container registry](../../../user/packages/container_registry/index.md), repositories within a specific project must have a path that matches the project full path. This is essentially how we establish a resource mapping between GitLab Rails and the registry, which serves multiple purposes, including granular authorization, scoping storage usage to a given project/group/namespace, and more.
|
||||
For the [GitLab container registry](../../../user/packages/container_registry/index.md),
|
||||
repositories within a specific project must have a path that matches the project full path.
|
||||
This is essentially how we establish a resource mapping between GitLab Rails and
|
||||
the registry, which serves multiple purposes, including granular authorization,
|
||||
scoping storage usage to a given project/group/namespace, and more.
|
||||
|
||||
Regarding the GAR integration, since there is no equivalent entities for GitLab project/group/namespace resources on the GAR side, we aim to simplify matters by allowing users to attach any [GAR repository](https://cloud.google.com/artifact-registry/docs/repositories) to any GitLab project, regardless of their respective paths. Similarly, we do not plan to restrict the attachment of a particular GAR repository to a single GitLab project. Ultimately, it is up to users to determine how to organize both datasets in the way that best suits their needs.
|
||||
Regarding the GAR integration, since there is no equivalent entities for GitLab
|
||||
project/group/namespace resources on the GAR side, we aim to simplify matters by
|
||||
allowing users to attach any [GAR repository](https://cloud.google.com/artifact-registry/docs/repositories) to any
|
||||
GitLab project, regardless of their respective paths. Similarly, we do not plan
|
||||
to restrict the attachment of a particular GAR repository to a single GitLab
|
||||
project. Ultimately, it is up to users to determine how to organize both
|
||||
datasets in the way that best suits their needs.
|
||||
|
||||
#### GAR API
|
||||
|
||||
GAR provides three APIs: Docker API, REST API, and RPC API.
|
||||
|
||||
The [Docker API](https://cloud.google.com/artifact-registry/docs/reference/docker-api) is based on the [Docker Registry HTTP API V2](https://distribution.github.io/distribution/spec/api/), now superseded by the [OCI Distribution Specification API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md) (from now on referred to as OCI API). This API is used for pushing/pulling images to/from GAR and also provides some discoverability operations. Refer to [Alternative Solutions](#alternative-solutions) for the reasons why we don't intend to use it.
|
||||
The [Docker API](https://cloud.google.com/artifact-registry/docs/reference/docker-api)
|
||||
is based on the [Docker Registry HTTP API V2](https://distribution.github.io/distribution/spec/api/),
|
||||
now superseded by the [OCI Distribution Specification API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md)
|
||||
(from now on referred to as OCI API). This API is used for pushing/pulling images to/from
|
||||
GAR and also provides some discoverability operations. Refer to [Alternative Solutions](#alternative-solutions)
|
||||
for the reasons why we don't intend to use it.
|
||||
|
||||
Among the proprietary GAR APIs, the [REST API](https://cloud.google.com/artifact-registry/docs/reference/rest) provides basic functionality for managing repositories. This includes [`list`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/list) and [`get`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/get) operations for container image repositories, which could be used for this integration. Both operations return the same data structure, represented by the [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages#DockerImage) object, so both provide the same level of detail.
|
||||
Among the proprietary GAR APIs, the [REST API](https://cloud.google.com/artifact-registry/docs/reference/rest)
|
||||
provides basic functionality for managing repositories. This includes
|
||||
[`list`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/list)
|
||||
and [`get`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages/get)
|
||||
operations for container image repositories, which could be used for this integration.
|
||||
Both operations return the same data structure, represented by the
|
||||
[`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rest/v1/projects.locations.repositories.dockerImages#DockerImage)
|
||||
object, so both provide the same level of detail.
|
||||
|
||||
Last but not least, there is also an [RPC API](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1), backed by gRPC and Protocol Buffers. This API provides the most functionality, covering all GAR features. From the available operations, we can make use of the [`ListDockerImagesRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#listdockerimagesrequest) and [`GetDockerImageRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.GetDockerImageRequest) operations. As with the REST API, both responses are composed of [`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage) objects.
|
||||
Last but not least, there is also an [RPC API](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1),
|
||||
backed by gRPC and Protocol Buffers. This API provides the most functionality,
|
||||
covering all GAR features. From the available operations, we can make use of the
|
||||
[`ListDockerImagesRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#listdockerimagesrequest)
|
||||
and [`GetDockerImageRequest`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.GetDockerImageRequest)
|
||||
operations. As with the REST API, both responses are composed of
|
||||
[`DockerImage`](https://cloud.google.com/artifact-registry/docs/reference/rpc/google.devtools.artifactregistry.v1#google.devtools.artifactregistry.v1.DockerImage)
|
||||
objects.
|
||||
|
||||
Between the two proprietary API options, we chose the RPC one because it provides support not only for the operations we need today but also offers better coverage of all GAR features, which will be beneficial in future iterations. Finally, we do not intend to make direct use of this API but rather use it through the official Ruby client SDK. See [Client SDK](backend.md#client-sdk) below for more details.
|
||||
Between the two proprietary API options, we chose the RPC one because it
|
||||
provides support not only for the operations we need today but also offers
|
||||
better coverage of all GAR features, which will be beneficial in future
|
||||
iterations. Finally, we do not intend to make direct use of this API but rather
|
||||
use it through the official Ruby client SDK. See [Client SDK](backend.md#client-sdk) below for more details.
|
||||
|
||||
#### Backend Integration
|
||||
|
||||
|
|
@ -96,7 +144,10 @@ This integration will need several changes on the backend side of the rails proj
|
|||
|
||||
#### UI/UX
|
||||
|
||||
This integration will include a dedicated page named "Google Artifact Registry," listed under the "Operate" section of the sidebar. This page will enable users to view the list of all container images in the configured GAR repository. See the [UI/UX](ui_ux.md) page for additional details.
|
||||
This integration will include a dedicated page named "Google Artifact Registry,"
|
||||
listed under the "Operate" section of the sidebar. This page will enable users to
|
||||
view the list of all container images in the configured GAR repository. See the
|
||||
[UI/UX](ui_ux.md) page for additional details.
|
||||
|
||||
#### GraphQL APIs
|
||||
|
||||
|
|
@ -106,7 +157,11 @@ This integration will include a dedicated page named "Google Artifact Registry,"
|
|||
|
||||
### Use Docker/OCI API
|
||||
|
||||
One alternative solution considered was to use the Docker/OCI API provided by GAR, as it is a common standard for container registries. This approach would have allowed GitLab to reuse [existing logic](https://gitlab.com/gitlab-org/gitlab/-/blob/20df77103147c0c8ff1c22a888516eba4bab3c46/lib/container_registry/client.rb) for connecting to container registries, which could potentially speed up development. However, there were several drawbacks to this approach:
|
||||
One alternative solution considered was to use the Docker/OCI API provided by
|
||||
GAR, as it is a common standard for container registries. This approach would
|
||||
have allowed GitLab to reuse [existing logic](https://gitlab.com/gitlab-org/gitlab/-/blob/20df77103147c0c8ff1c22a888516eba4bab3c46/lib/container_registry/client.rb)
|
||||
for connecting to container registries, which could potentially speed up
|
||||
development. However, there were several drawbacks to this approach:
|
||||
|
||||
- **Authentication Complexity**: The API requires authentication tokens, which need to be requested at the [login endpoint](https://distribution.github.io/distribution/spec/auth/token/). These tokens have limited validity, adding complexity to the authentication process. Handling expiring tokens would have been necessary.
|
||||
|
||||
|
|
@ -116,6 +171,17 @@ One alternative solution considered was to use the Docker/OCI API provided by GA
|
|||
|
||||
- **Multiple Requests**: To retrieve all the required information about each image, multiple requests to different endpoints (listing tags, obtaining image manifests, and image configuration blobs) would have been necessary, leading to a `1+N` performance issue.
|
||||
|
||||
GitLab had previously faced significant challenges with the last two limitations, prompting the development of a custom [GitLab container registry API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md) to address them. Additionally, GitLab decided to [deprecate support](../../../update/deprecations.md#use-of-third-party-container-registries-is-deprecated) for connecting to third-party container registries using the Docker/OCI API due to these same limitations and the increased cost of maintaining two solutions in parallel. As a result, there is an ongoing effort to replace the use of the Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.
|
||||
GitLab had previously faced significant challenges with the last two limitations,
|
||||
prompting the development of a custom
|
||||
[GitLab container registry API](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/api.md)
|
||||
to address them. Additionally, GitLab decided to
|
||||
[deprecate support](../../../update/deprecations.md#use-of-third-party-container-registries-is-deprecated)
|
||||
for connecting to third-party container registries using the Docker/OCI API due to
|
||||
these same limitations and the increased cost of maintaining two solutions in
|
||||
parallel. As a result, there is an ongoing effort to replace the use of the
|
||||
Docker/OCI API endpoints with custom API endpoints for all container registry functionalities in GitLab.
|
||||
|
||||
Considering these factors, the decision was made to build the GAR integration from scratch using the proprietary GAR API. This approach provides more flexibility and control over the integration and can serve as a foundation for future expansions, such as support for other GAR artifact formats.
|
||||
Considering these factors, the decision was made to build the GAR integration
|
||||
from scratch using the proprietary GAR API. This approach provides more flexibility
|
||||
and control over the integration and can serve as a foundation for future expansions,
|
||||
such as support for other GAR artifact formats.
|
||||
|
|
|
|||
|
|
@ -10,11 +10,31 @@ participating-stages: []
|
|||
|
||||
# Image resizing for avatars and content images
|
||||
|
||||
Currently, we are showing all uploaded images 1:1, which is of course not ideal. To improve performance greatly, add image resizing to the backend. There are two main areas of image resizing to consider; avatars and content images. The MVC for this implementation focuses on Avatars. Avatars requests consist of approximately 70% of total image requests. There is an identified set of sizes we intend to support which makes the scope of this first MVC very narrow. Content image resizing has many more considerations for size and features. It is entirely possible that we have two separate development efforts with the same goal of increasing performance via image resizing.
|
||||
Currently, we are showing all uploaded images 1:1, which is of course not ideal.
|
||||
To improve performance greatly, add image resizing to the backend. There are two
|
||||
main areas of image resizing to consider; avatars and content images. The MVC
|
||||
for this implementation focuses on Avatars. Avatars requests consist of
|
||||
approximately 70% of total image requests. There is an identified set of sizes
|
||||
we intend to support which makes the scope of this first MVC very narrow.
|
||||
Content image resizing has many more considerations for size and features. It is
|
||||
entirely possible that we have two separate development efforts with the same
|
||||
goal of increasing performance via image resizing.
|
||||
|
||||
## MVC Avatar Resizing
|
||||
|
||||
When implementing a dynamic image resizing solution, images should be resized and optimized on the fly so that if we define new targeted sizes later we can add them dynamically. This would mean a huge improvement in performance as some of the measurements suggest that we can save up to 95% of our current load size. Our initial investigations indicate that we have uploaded approximately 1.65 million avatars totaling approximately 80 GB in size and averaging approximately 48 KB each. Early measurements indicate we can reduce the most common avatar dimensions to between 1-3 KB in size, netting us a greater than 90% size reduction. For the MVC we don't consider application level caching and rely purely on HTTP based caches as implemented in CDNs and browsers, but might revisit this decision later on. To mitigate performance issues with avatar resizing, especially in the case of self managed, an operations feature flag is implemented to disable dynamic image resizing.
|
||||
When implementing a dynamic image resizing solution, images should be resized
|
||||
and optimized on the fly so that if we define new targeted sizes later we can
|
||||
add them dynamically. This would mean a huge improvement in performance as some
|
||||
of the measurements suggest that we can save up to 95% of our current load size.
|
||||
Our initial investigations indicate that we have uploaded approximately 1.65 million
|
||||
avatars totaling approximately 80 GB in size and averaging approximately
|
||||
48 KB each. Early measurements indicate we can reduce the most common avatar
|
||||
dimensions to between 1-3 KB in size, netting us a greater than 90% size
|
||||
reduction. For the MVC we don't consider application level caching and rely
|
||||
purely on HTTP based caches as implemented in CDNs and browsers, but might
|
||||
revisit this decision later on. To mitigate performance issues with avatar
|
||||
resizing, especially in the case of self managed, an operations feature flag is
|
||||
implemented to disable dynamic image resizing.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
|
@ -35,16 +55,22 @@ sequenceDiagram
|
|||
|
||||
## Content Image Resizing
|
||||
|
||||
Content image resizing is a more complex problem to tackle. There are no set size restrictions and there are additional features or requirements to consider.
|
||||
Content image resizing is a more complex problem to tackle. There are no set
|
||||
size restrictions and there are additional features or requirements to consider.
|
||||
|
||||
- Dynamic WebP support - the WebP format typically achieves an average of 30% more compression than JPEG without the loss of image quality. More details are in [this Google Comparative Study](https://developers.google.com/speed/webp/docs/c_study)
|
||||
- Dynamic WebP support - the WebP format typically achieves an average of 30% more
|
||||
compression than JPEG without the loss of image quality. More details are in
|
||||
[this Google Comparative Study](https://developers.google.com/speed/webp/docs/c_study)
|
||||
- Extract first GIF image so we can prevent from loading 10 MB pixels
|
||||
- Check Device Pixel Ratio to deliver nice images on High DPI screens
|
||||
- Progressive image loading, similar to what is described in [this article about how to build a progressive image loader](https://www.sitepoint.com/how-to-build-your-own-progressive-image-loader/)
|
||||
- Progressive image loading, similar to what is described in
|
||||
[this article about how to build a progressive image loader](https://www.sitepoint.com/how-to-build-your-own-progressive-image-loader/)
|
||||
- Resizing recommendations (for example, size and clarity)
|
||||
- Storage
|
||||
|
||||
The MVC Avatar resizing implementation is integrated into Workhorse. With the extra requirements for content image resizing, this may require further use of GraphicsMagik (GM) or a similar library and breaking it out of Workhorse.
|
||||
The MVC Avatar resizing implementation is integrated into Workhorse. With the
|
||||
extra requirements for content image resizing, this may require further use of
|
||||
GraphicsMagik (GM) or a similar library and breaking it out of Workhorse.
|
||||
|
||||
## Iterations
|
||||
|
||||
|
|
|
|||
|
|
@ -12,32 +12,55 @@ Users can become an Organization member in the following way:
|
|||
|
||||
Organization members can get access to Groups and Projects in an Organization as:
|
||||
|
||||
- A Group Member: this grants access to the Group and all its Projects, regardless of their visibility.
|
||||
- A Project Member: this grants access to the Project, and limited access to parent Groups, regardless of their visibility.
|
||||
- A Non-Member: this grants access to public and internal Groups and Projects of that Organization. To access a private Group or Project in an Organization, a user must become a member. Internal visibility will not be available for Organization in Cells 1.0.
|
||||
- A Group Member: this grants access to the Group and all its Projects,
|
||||
regardless of their visibility.
|
||||
- A Project Member: this grants access to the Project, and limited access to
|
||||
parent Groups, regardless of their visibility.
|
||||
- A Non-Member: this grants access to public and internal Groups and Projects of
|
||||
that Organization. To access a private Group or Project in an Organization, a
|
||||
user must become a member. Internal visibility will not be available for
|
||||
Organization in Cells 1.0.
|
||||
|
||||
Organization members can be managed in the following ways:
|
||||
|
||||
- As [Enterprise Users](../../../user/enterprise_user/index.md), managed by the Organization. This includes control over their User account and the ability to block the User. In the context of Cells 1.0, Organization members will essentially function like Enterprise Users.
|
||||
- As Non-Enterprise Users, managed by the default Organization. Non-Enterprise Users can be removed from an Organization, but the User keeps ownership of their User account. This will only be considered post Cells 1.0.
|
||||
- As [Enterprise Users](../../../user/enterprise_user/index.md), managed by the
|
||||
Organization. This includes control over their User account and the ability to
|
||||
block the User. In the context of Cells 1.0, Organization members will
|
||||
essentially function like Enterprise Users.
|
||||
- As Non-Enterprise Users, managed by the default Organization. Non-Enterprise Users
|
||||
can be removed from an Organization, but the User keeps ownership of
|
||||
their User account. This will only be considered post Cells 1.0.
|
||||
|
||||
Enterprise Users are only available to Organizations with a Premium or Ultimate subscription. Organizations on the free tier will only be able to host Non-Enterprise Users.
|
||||
Enterprise Users are only available to Organizations with a Premium or Ultimate
|
||||
subscription. Organizations on the free tier will only be able to host
|
||||
Non-Enterprise Users.
|
||||
|
||||
## How do Users join an Organization?
|
||||
|
||||
Users are visible across all Organizations. This allows Users to move between Organizations. Users can join an Organization by:
|
||||
Users are visible across all Organizations. This allows Users to move between
|
||||
Organizations. Users can join an Organization by:
|
||||
|
||||
1. Being invited by an Organization Owner. Because Organizations are private on Cells 1.0, only the Organization Owner can add new Users to an Organization by iniviting them to create an account.
|
||||
1. Being invited by an Organization Owner. Because Organizations are private on
|
||||
Cells 1.0, only the Organization Owner can add new Users to an Organization
|
||||
by iniviting them to create an account.
|
||||
|
||||
1. Becoming a Member of a Namespace (Group, Subgroup, or Project) contained within an Organization. A User can become a Member of a Namespace by:
|
||||
1. Becoming a Member of a Namespace (Group, Subgroup, or Project) contained
|
||||
within an Organization. A User can become a Member of a Namespace by:
|
||||
|
||||
- Being invited by username
|
||||
- Being invited by email address
|
||||
- Requesting access. This requires visibility of the Organization and Namespace and must be accepted by the owner of the Namespace. Access cannot be requested to private Groups or Projects.
|
||||
- Requesting access. This requires visibility of the Organization and
|
||||
Namespace and must be accepted by the owner of the Namespace. Access cannot
|
||||
be requested to private Groups or Projects.
|
||||
|
||||
1. Becoming an Enterprise User of an Organization. Bringing Enterprise Users to the Organization level is planned post MVC. For the Organization MVC Enterprise Users will remain at the top-level Group.
|
||||
1. Becoming an Enterprise User of an Organization. Bringing Enterprise Users to
|
||||
the Organization level is planned post MVC. For the Organization MVC
|
||||
Enterprise Users will remain at the top-level Group.
|
||||
|
||||
The creator of an Organization automatically becomes the Organization Owner. It is not necessary to become a User of a specific Organization to comment on or create public issues, for example. All existing Users can create and comment on all public issues.
|
||||
The creator of an Organization automatically becomes the Organization Owner. It
|
||||
is not necessary to become a User of a specific Organization to comment on or
|
||||
create public issues, for example. All existing Users can create and comment on
|
||||
all public issues.
|
||||
|
||||
## How do Users sign in to an Organization?
|
||||
|
||||
|
|
@ -45,16 +68,26 @@ TBD
|
|||
|
||||
## When can Users see an Organization?
|
||||
|
||||
For Cells 1.0, an Organization can only be private. Private Organizations can only be seen by their Organization members. They can only contain private Groups and Projects.
|
||||
For Cells 1.0, an Organization can only be private. Private Organizations can
|
||||
only be seen by their Organization members. They can only contain private Groups
|
||||
and Projects.
|
||||
|
||||
For Cells 1.5, Organizations can also be public. Public Organizations can be seen by everyone. They can contain public and private Groups and Projects.
|
||||
For Cells 1.5, Organizations can also be public. Public Organizations can be
|
||||
seen by everyone. They can contain public and private Groups and Projects.
|
||||
|
||||
In the future, Organizations will get an additional internal visibility setting for Groups and Projects. This will allow us to introduce internal Organizations that can only be seen by the Users it contains. This would mean that only Users that are part of the Organization will see:
|
||||
In the future, Organizations will get an additional internal visibility setting
|
||||
for Groups and Projects. This will allow us to introduce internal Organizations
|
||||
that can only be seen by the Users it contains. This would mean that only Users
|
||||
that are part of the Organization will see:
|
||||
|
||||
- The Organization front page, instead of a 404 when navigating to the Organization URL
|
||||
- The Organization front page, instead of a 404 when navigating to the
|
||||
Organization URL
|
||||
- Name of the Organization
|
||||
- Description of the Organization
|
||||
- Organization pages, such as the Activity page, Groups, Projects, and Users overview. Content of these pages will be determined by each User's access to specific Groups and Projects. For instance, private Projects would only be seen by the members of this Project in the Project overview.
|
||||
- Organization pages, such as the Activity page, Groups, Projects, and Users
|
||||
overview. Content of these pages will be determined by each User's access to
|
||||
specific Groups and Projects. For instance, private Projects would only be
|
||||
seen by the members of this Project in the Project overview.
|
||||
- Internal Groups and Projects
|
||||
|
||||
As an end goal, we plan to offer the following scenarios:
|
||||
|
|
@ -70,23 +103,36 @@ As an end goal, we plan to offer the following scenarios:
|
|||
|
||||
## What can Users see in an Organization?
|
||||
|
||||
Users can see the things that they have access to in an Organization. For instance, an Organization member would be able to access only the private Groups and Projects that they are a member of, but could see all public Groups and Projects. Actionable items such as issues, merge requests and the to-do list are seen in the context of the Organization. This means that a User might see 10 merge requests they created in `Organization A`, and 7 in `Organization B`, when in total they have created 17 merge requests across both Organizations.
|
||||
Users can see the things that they have access to in an Organization. For
|
||||
instance, an Organization member would be able to access only the private Groups
|
||||
and Projects that they are a member of, but could see all public Groups and
|
||||
Projects. Actionable items such as issues, merge requests and the to-do list are
|
||||
seen in the context of the Organization. This means that a User might see
|
||||
10 merge requests they created in `Organization A`, and 7 in `Organization B`, when
|
||||
in total they have created 17 merge requests across both Organizations.
|
||||
|
||||
## What is a Billable Member?
|
||||
|
||||
How Billable Members are defined differs between GitLabs two main offerings:
|
||||
|
||||
- Self-managed (SM): [Billable Members are Users who consume seats against the SM License](../../../subscriptions/self_managed/index.md#subscription-seats). Custom roles elevated above the Guest role are consuming seats.
|
||||
- GitLab.com (SaaS): [Billable Members are Users who are Members of a Namespace (Group or Project) that consume a seat against the SaaS subscription for the top-level Group](../../../subscriptions/gitlab_com/index.md#how-seat-usage-is-determined). Currently, [Users with Minimal Access](../../../user/permissions.md#users-with-minimal-access) and Users without a Group count towards a licensed seat, but [that's changing](https://gitlab.com/gitlab-org/gitlab/-/issues/330663#note_1133361094).
|
||||
- Self-managed (SM): [Billable Members are Users who consume seats against the SM License](../../../subscriptions/self_managed/index.md#subscription-seats).
|
||||
Custom roles elevated above the Guest role are consuming seats.
|
||||
- GitLab.com (SaaS): [Billable Members are Users who are Members of a Namespace (Group or Project) that consume a seat against the SaaS subscription for the top-level Group](../../../subscriptions/gitlab_com/index.md#how-seat-usage-is-determined).
|
||||
Currently, [Users with Minimal Access](../../../user/permissions.md#users-with-minimal-access)
|
||||
and Users without a Group count towards a licensed seat, but [that's changing](https://gitlab.com/gitlab-org/gitlab/-/issues/330663#note_1133361094).
|
||||
|
||||
These differences and how they are calculated and displayed often cause confusion. For both SM and SaaS, we evaluate whether a User consumes a seat against the same core rule set:
|
||||
These differences and how they are calculated and displayed often cause
|
||||
confusion. For both SM and SaaS, we evaluate whether a User consumes a seat
|
||||
against the same core rule set:
|
||||
|
||||
1. They are active users
|
||||
1. They are not bot users
|
||||
1. For the Ultimate tier, they are not a Guest
|
||||
|
||||
For (1) this is determined differently per offering, in terms of both what classifies as active and also due to the underlying model that we refer to (User vs Member).
|
||||
To help demonstrate the various associations used in GitLab relating to Billable Members, here is a relationship diagram:
|
||||
For (1) this is determined differently per offering, in terms of both what
|
||||
classifies as active and also due to the underlying model that we refer to
|
||||
(User vs Member). To help demonstrate the various associations used in GitLab relating
|
||||
to Billable Members, here is a relationship diagram:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
|
@ -104,30 +150,55 @@ graph TD
|
|||
PGL -.belongs to.->C
|
||||
```
|
||||
|
||||
GroupGroupLink is the join table between two Group records, indicating that one Group has invited the other.
|
||||
ProjectGroupLink is the join table between a Group and a Project, indicating the Group has been invited to the Project.
|
||||
GroupGroupLink is the join table between two Group records, indicating that one
|
||||
Group has invited the other. ProjectGroupLink is the join table between a Group
|
||||
and a Project, indicating the Group has been invited to the Project.
|
||||
|
||||
SaaS has some additional complexity when it comes to the relationships that
|
||||
determine whether or not a User is considered a Billable Member, particularly
|
||||
relating to Group/Project membership that can often lead to confusion. An
|
||||
example of that are Members of a Group that have been invited into another Group
|
||||
or Project and therewith become billable.
|
||||
|
||||
SaaS has some additional complexity when it comes to the relationships that determine whether or not a User is considered a Billable Member, particularly relating to Group/Project membership that can often lead to confusion. An example of that are Members of a Group that have been invited into another Group or Project and therewith become billable.
|
||||
There are two charts as the flow is different for each: [SaaS](https://mermaid.live/view#pako:eNqNVl1v2jAU_StXeS5M-3hCU6N2aB3SqKbSPkyAhkkuxFsSs9hpVUX899mxYxsnlOWFcH1877nnfkATJSzFaBLtcvaSZKQS8DhdlWCeijGxXBCygCeOFdzSPCfbHOGrRK9Ho2tlvUkEfcZmo97HXBCBG6AcSGuOj86ZA8No_BP5eHQNMz7HYovV8kuGyR-gOx1I3Qd9Ap-31btrtgORITxIPnBXsfoAGcWKVEn2uj4T4Z6pAPdMdKyX8t2mIG-5ex0LkCnBdO4OOrOhO-O3TDQzrkkSkN9izW-BCCUTCB-8hGU866Bl45FxKJ-GdGiDDYI7SOtOp7o0GW90rA20NYjXQxE6cWSaGr1Q2BnX9hCnIbZWc1reJAly3pisMsJ19vKEFiQHfQw5PmMenwqhPQ5Uxa-DjeAa5IJk_g3t-hvdZ8jFA8vxrpYvccfWHIA6aVmrLtMQj2rvuqPynSZYcnx8PWDzlAuZsay3MfouPJxl1c9hKFCIPedzSBuH5fV2X5FDBrT8Zadk2bbszJur_xsp9UznzZRWmIizV-Njx346X9TbPpwoVqO9xobebUZmF3gse0yk9wA-jDBkflTst2TS-EyMTcrTZmGz7hPrkG8HdChdv1n5TAWmGuxHLmXI9qgTza9aO93-TVfnobAh1M6V0VDtuk7E0w313tMUy3Swc_Tyll9VLUwMPcFxUJGBNdKYTTTwY-ByesC_qusx1Yk0bXtao9kk8Snzj8eLsX0lwqV2ujnUE5Bw7FT4g7QbQGM-4YWoXPRZ2C7BnT4TXZPSiAHFUIP3nVhGbiN3G9-OyKWsTvpSS60yMYZA5U_HtyQzdy7p7GCBon65OyXNWJwT9DSNMwF7YB3Xly1o--gqKrAqCE3l359GHa4iuQ8KXEUT-ZrijtS5WEWr8iihpBZs8Vom0WRHco5XUX1IZd9NKZETUxjr8R82ROYl) and [SM](https://mermaid.live/view#pako:eNqFk1FvwiAQx7_KhefVD-CDZo2JNdmcWe3DYpeI7alsLRgKLob0u48qtqxRx9Plz4-7-3NgSCZyJEOyLcRPtqdSwXKScnBLVyhXswrUHiGxMYSsKOimwPHnXwiCYNQAsaIKzXOm2BFh3ShrOGvjujvQghAMPrAaBCOITKRLyu9Rc9FAc6Gu9VPegVELLEKzkOILMwWhUH6yRdhCcWJilEeWXSz5VJzcqrWycWvc830rOmdwnmZ8KoU-vEnXU6-bf6noPmResdzYWxdboHDeAiHBbfqOuqifonX6Ym-CV7g8HfAhfZ0U2-2xUu-iwKm2wdg4BRoJWAUXufZH5JnqH-8ye42YpFCsbGbvRN-Tx7UmunfxqFCfvZfTNeS9AfJESpQlZbn9K6Y5lxL7KUpMydCGOZXfKUl5bTmqlYhPPCNDJTU-EX3IrZEJoztJy4tY_wJJwxFj).
|
||||
|
||||
## How can Users switch between different Organizations?
|
||||
|
||||
For Organizations in the context of Cells 1.0, Users will only be able to be part of a single Organization. If a user wants to be part of multiple Organizations, they have to join every additional Organization with a new user account.
|
||||
For Organizations in the context of Cells 1.0, Users will only be able to be
|
||||
part of a single Organization. If a user wants to be part of multiple
|
||||
Organizations, they have to join every additional Organization with a new user
|
||||
account.
|
||||
|
||||
Later, in the context of Cells 1.5, Users can utilize a [context switcher](https://gitlab.com/gitlab-org/gitlab/-/issues/411637). This feature allows easy navigation and access to different Organizations' content and settings. By clicking on the context switcher and selecting a specific Organization from the provided list, Users can seamlessly transition their view and permissions, enabling them to interact with the resources and functionalities of the chosen Organization.
|
||||
Later, in the context of Cells 1.5, Users can utilize a
|
||||
[context switcher](https://gitlab.com/gitlab-org/gitlab/-/issues/411637). This feature
|
||||
allows easy navigation and access to different Organizations' content and
|
||||
settings. By clicking on the context switcher and selecting a specific
|
||||
Organization from the provided list, Users can seamlessly transition their view
|
||||
and permissions, enabling them to interact with the resources and
|
||||
functionalities of the chosen Organization.
|
||||
|
||||
## What happens when a User is deleted?
|
||||
|
||||
We've identified three different scenarios where a User can be removed from an Organization:
|
||||
|
||||
1. Removal: The User is removed from the organization_users table. This is similar to the User leaving a company, but the User can join the Organization again after access approval.
|
||||
1. Banning: The User is banned. This can happen in case of misconduct but the User cannot be added again to the Organization until they are unbanned. In this case, we keep the organization_users entry and change the permission to none.
|
||||
1. Deleting: The User is deleted. We assign everything the User has authored to the Ghost User and delete the entry from the organization_users table.
|
||||
1. Removal: The User is removed from the organization_users table. This is
|
||||
similar to the User leaving a company, but the User can join the Organization
|
||||
again after access approval.
|
||||
1. Banning: The User is banned. This can happen in case of misconduct but the
|
||||
User cannot be added again to the Organization until they are unbanned. In
|
||||
this case, we keep the organization_users entry and change the permission to
|
||||
none.
|
||||
1. Deleting: The User is deleted. We assign everything the User has authored to
|
||||
the Ghost User and delete the entry from the organization_users table.
|
||||
|
||||
As part of the Organization MVC, Organization Owners can remove Organization members. This means that the User's membership entries are deleted from all Groups and Projects that are contained within the Organization. In addition, the User entry is removed from the `organization_users` table.
|
||||
As part of the Organization MVC, Organization Owners can remove Organization
|
||||
members. This means that the User's membership entries are deleted from all
|
||||
Groups and Projects that are contained within the Organization. In addition, the
|
||||
User entry is removed from the `organization_users` table.
|
||||
|
||||
Actions such as banning and deleting a User will be added to the Organization at a later point.
|
||||
|
||||
## Organization Non-Users
|
||||
|
||||
Non-Users are external to the Organization and can only access the public resources of an Organization, such as public Projects.
|
||||
Non-Users are external to the Organization and can only access the public
|
||||
resources of an Organization, such as public Projects.
|
||||
|
|
|
|||
|
|
@ -62,7 +62,11 @@ You can also refer to fields of [Work Item](../../../api/graphql/reference/index
|
|||
|
||||
### Work Item widgets
|
||||
|
||||
All Work Item types share the same pool of predefined widgets and are customized by which widgets are active on a specific type. The list of widgets for any certain Work Item type is currently predefined and is not customizable. However, in the future we plan to allow users to create new Work Item types and define a set of widgets for them.
|
||||
All Work Item types share the same pool of predefined widgets and are customized
|
||||
by which widgets are active on a specific type. The list of widgets for any
|
||||
certain Work Item type is currently predefined and is not customizable. However,
|
||||
in the future we plan to allow users to create new Work Item types and define a
|
||||
set of widgets for them.
|
||||
|
||||
### Widget types (updating)
|
||||
|
||||
|
|
@ -139,7 +143,7 @@ As types expand, and parent items have their own parent items, the hierarchy cap
|
|||
|
||||
Currently, following are the allowed Parent-child relationships:
|
||||
|
||||
| Type | Can be parent of | Can be child of |
|
||||
| Type | Can be parent of | Can be child of |
|
||||
|------------|------------------|------------------|
|
||||
| Epic | Epic | Epic |
|
||||
| Issue | Task | Epic |
|
||||
|
|
@ -176,15 +180,39 @@ Work Items main goal is to enhance the planning toolset to become the most popul
|
|||
|
||||
### Scalability
|
||||
|
||||
Currently, different entities like issues, epics, merge requests etc share many similar features but these features are implemented separately for every entity type. This makes implementing new features or refactoring existing ones problematic: for example, if we plan to add new feature to issues and incidents, we would need to implement it separately on issue and incident types. With work items, any new feature is implemented via widgets for all existing types which makes the architecture more scalable.
|
||||
Currently, different entities like issues, epics, merge requests etc share many
|
||||
similar features but these features are implemented separately for every entity
|
||||
type. This makes implementing new features or refactoring existing ones
|
||||
problematic: for example, if we plan to add new feature to issues and incidents,
|
||||
we would need to implement it separately on issue and incident types. With work
|
||||
items, any new feature is implemented via widgets for all existing types which
|
||||
makes the architecture more scalable.
|
||||
|
||||
### Flexibility
|
||||
|
||||
With existing implementation, we have a rigid structure for issuables, merge requests, epics etc. This structure is defined on both backend and frontend, so any change requires a coordinated effort. Also, it would be very hard to make this structure customizable for the user without introducing a set of flags to enable/disable any existing feature. Work Item architecture allows frontend to display Work Item widgets in a flexible way: whatever is present in Work Item widgets, will be rendered on the page. This allows us to make changes fast and makes the structure way more flexible. For example, if we want to stop displaying labels on the Incident page, we remove labels widget from Incident Work Item type on the backend. Also, in the future this will allow users to define the set of widgets they want to see on custom Work Item types.
|
||||
With existing implementation, we have a rigid structure for issuables,
|
||||
merge requests, epics etc. This structure is defined on both backend and frontend,
|
||||
so any change requires a coordinated effort. Also, it would be very hard to make
|
||||
this structure customizable for the user without introducing a set of flags to
|
||||
enable/disable any existing feature. Work Item architecture allows frontend to
|
||||
display Work Item widgets in a flexible way: whatever is present in Work Item
|
||||
widgets, will be rendered on the page. This allows us to make changes fast and
|
||||
makes the structure way more flexible. For example, if we want to stop
|
||||
displaying labels on the Incident page, we remove labels widget from Incident
|
||||
Work Item type on the backend. Also, in the future this will allow users to
|
||||
define the set of widgets they want to see on custom Work Item types.
|
||||
|
||||
### A consistent experience
|
||||
|
||||
As much as we try to have consistent behavior for similar features on different entities, we still have differences in the implementation. For example, updating labels on merge request via GraphQL API can be done with dedicated `setMergeRequestLabels` mutation, while for the issue we call more coarse-grained `updateIssue`. This provides inconsistent experience for both frontend and external API users. As a result, epics, issues, requirements, and others all have similar but just subtle enough differences in common interactions that the user needs to hold a complicated mental model of how they each behave.
|
||||
As much as we try to have consistent behavior for similar features on different
|
||||
entities, we still have differences in the implementation. For example, updating
|
||||
labels on merge request via GraphQL API can be done with dedicated
|
||||
`setMergeRequestLabels` mutation, while for the issue we call more
|
||||
coarse-grained `updateIssue`. This provides inconsistent experience for both
|
||||
frontend and external API users. As a result, epics, issues, requirements, and
|
||||
others all have similar but just subtle enough differences in common
|
||||
interactions that the user needs to hold a complicated mental model of how they
|
||||
each behave.
|
||||
|
||||
Work Item architecture is designed with making all the features for all the types consistent, implemented as Work Item widgets.
|
||||
|
||||
|
|
|
|||
Binary file not shown.
|
Before Width: | Height: | Size: 116 KiB After Width: | Height: | Size: 103 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 169 KiB |
|
|
@ -17,11 +17,10 @@ DETAILS:
|
|||
GitLab administrators can use the Runner Fleet Dashboard to assess the health of your instance runners.
|
||||
The Runner Fleet Dashboard shows:
|
||||
|
||||
- Recent CI errors related caused by runner infrastructure.
|
||||
- Number of concurrent jobs executed on most busy runners.
|
||||
- Histogram of job queue times [(available only with ClickHouse)](#enable-more-ci-analytics-features-with-clickhouse).
|
||||
|
||||
Support for usage and cost analysis are proposed in [epic 11183](https://gitlab.com/groups/gitlab-org/-/epics/11183).
|
||||
- Recent CI errors caused by runner infrastructure
|
||||
- Number of concurrent jobs executed on most busy runners
|
||||
- Compute minutes used by instance runners
|
||||
- Job queue times (available only with [ClickHouse](#enable-more-ci-analytics-features-with-clickhouse))
|
||||
|
||||

|
||||
|
||||
|
|
@ -45,7 +44,8 @@ These features require [setting up an additional infrastructure](#enable-more-ci
|
|||
|
||||
Prerequisites:
|
||||
|
||||
- You must be an administrator.
|
||||
- You must have administrator access to the instance.
|
||||
- You must enable the [ClickHouse integration](../../integration/clickhouse.md).
|
||||
|
||||
To analyze runner usage, you can export a CSV file that contains the number of jobs and executed runner minutes. The
|
||||
CSV file shows the runner type and job status for each project. The CSV is sent to your email when the export is completed.
|
||||
|
|
@ -62,16 +62,15 @@ To export compute minutes used by instance runners:
|
|||
DETAILS:
|
||||
**Tier:** Ultimate
|
||||
**Offering:** GitLab.com, Self-managed, GitLab Dedicated
|
||||
**Status:** Experiment
|
||||
**Status:** Beta
|
||||
|
||||
> - [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/11180) in GitLab 16.7 with the [flags](../../administration/feature_flags.md) named `ci_data_ingestion_to_click_house` and `clickhouse_ci_analytics`. Disabled by default.
|
||||
> - [Enabled by default](https://gitlab.com/gitlab-org/gitlab/-/issues/424866) in GitLab 16.8. Feature flag `clickhouse_ci_analytics` removed.
|
||||
> - [Feature flags removed](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/145665/diffs) in GitLab 16.10.
|
||||
> - [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/11180) as an [experiment](../../policy/experiment-beta-support.md#experiment) in GitLab 16.7 with [flags](../../administration/feature_flags.md) named `ci_data_ingestion_to_click_house` and `clickhouse_ci_analytics`. Disabled by default.
|
||||
> - [Enabled on GitLab.com, self-managed, and GitLab Dedicated](https://gitlab.com/gitlab-org/gitlab/-/issues/424866) in GitLab 16.10. Feature flags `ci_data_ingestion_to_click_house` and `clickhouse_ci_analytics` removed.
|
||||
> - [Changed](https://gitlab.com/gitlab-org/gitlab/-/issues/424789) to [beta](../../policy/experiment-beta-support.md#beta) in GitLab 17.1.
|
||||
|
||||
This feature is an [experiment](../../policy/experiment-beta-support.md).
|
||||
To test it, we have launched an early adopters program.
|
||||
To join the list of users testing this feature, see
|
||||
[epic 11180](https://gitlab.com/groups/gitlab-org/-/epics/11180).
|
||||
WARNING:
|
||||
This feature is in [beta](../../policy/experiment-beta-support.md#beta) and subject to change without notice.
|
||||
For more information, see [epic 11180](https://gitlab.com/groups/gitlab-org/-/epics/11180).
|
||||
|
||||
To enable additional CI analytics features, [configure the ClickHouse integration](../../integration/clickhouse.md).
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,46 @@
|
|||
---
|
||||
stage: Verify
|
||||
group: Runner
|
||||
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
|
||||
---
|
||||
|
||||
# Runner fleet dashboard for groups
|
||||
|
||||
DETAILS:
|
||||
**Tier:** Ultimate
|
||||
**Offering:** GitLab.com, Self-managed
|
||||
**Status:** Beta
|
||||
|
||||
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/424789) as a [beta](../../policy/experiment-beta-support.md#beta) in GitLab 17.1.
|
||||
|
||||
Users with at least the Maintainer role for a group can use the runner fleet dashboard to assess the health of group runners.
|
||||
|
||||

|
||||
|
||||
## Dashboard metrics
|
||||
|
||||
The following metrics are available in the runner fleet dashboard:
|
||||
|
||||
| Metric | Description |
|
||||
|-------------------------------|-------------|
|
||||
| Online | Number of online runners. In the Admin Area, this metric displays the number of runners for the entire instance. In a group, this metric displays the number of runners for the group and its subgroups. |
|
||||
| Offline | Number of offline runners. |
|
||||
| Active runners | Number of active runners. |
|
||||
| Runner usage (previous month) | Number of compute minutes used by each project on group runners. Includes the option to export as CSV for cost analysis. |
|
||||
| Wait time to pick a job | Displays the mean wait time for runners. This metric provides insights into whether the runners are capable of servicing the CI/CD job queue in your organization's target service-level objectives. The data that creates this metric widget is updated every 24 hours. |
|
||||
|
||||
## View the runner fleet dashboard for groups
|
||||
|
||||
Prerequisites:
|
||||
|
||||
- You must have the Maintainer role for the group.
|
||||
|
||||
To view the runner fleet dashboard for groups:
|
||||
|
||||
1. On the left sidebar, select **Search or go to** and find your group.
|
||||
1. Select **Build > Runners**.
|
||||
1. Select **Fleet dashboard**.
|
||||
|
||||
For self-managed GitLab instances, most of the dashboard metrics work without any additional configuration.
|
||||
To use the **Runner usage** and **Wait time to pick a job** metrics,
|
||||
you must [configure the ClickHouse analytics database](runner_fleet_dashboard.md#enable-more-ci-analytics-features-with-clickhouse).
|
||||
|
|
@ -13,9 +13,24 @@ the [Elasticsearch integration documentation](../integration/advanced_search/ela
|
|||
|
||||
## Deep Dive
|
||||
|
||||
In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only: `https://gitlab.com/gitlab-org/create-stage/-/issues/1`) on the GitLab [Elasticsearch integration](../integration/advanced_search/elasticsearch.md) to share his domain specific knowledge with anyone who may work in this part of the codebase in the future. You can find the <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on [Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf). Everything covered in this deep dive was accurate as of GitLab 12.0, and while specific details might have changed, it should still serve as a good introduction.
|
||||
In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only:
|
||||
`https://gitlab.com/gitlab-org/create-stage/-/issues/1`) on the GitLab
|
||||
[Elasticsearch integration](../integration/advanced_search/elasticsearch.md) to
|
||||
share his domain specific knowledge with anyone who may work in this part of the
|
||||
codebase in the future. You can find the
|
||||
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
|
||||
[recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on
|
||||
[Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in
|
||||
[PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf).
|
||||
Everything covered in this deep dive was accurate as of GitLab 12.0, and while
|
||||
specific details might have changed, it should still serve as a good introduction.
|
||||
|
||||
In August 2020, a second Deep Dive was hosted, focusing on [GitLab-specific architecture for multi-indices support](#zero-downtime-reindexing-with-multiple-indices). The <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=0WdPR9oB2fg) and the [slides](https://lulalala.gitlab.io/gitlab-elasticsearch-deepdive/) are available. Everything covered in this deep dive was accurate as of GitLab 13.3.
|
||||
In August 2020, a second Deep Dive was hosted, focusing on
|
||||
[GitLab-specific architecture for multi-indices support](#zero-downtime-reindexing-with-multiple-indices). The
|
||||
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
|
||||
[recording on YouTube](https://www.youtube.com/watch?v=0WdPR9oB2fg) and the
|
||||
[slides](https://lulalala.gitlab.io/gitlab-elasticsearch-deepdive/) are available.
|
||||
Everything covered in this deep dive was accurate as of GitLab 13.3.
|
||||
|
||||
## Supported Versions
|
||||
|
||||
|
|
@ -36,11 +51,24 @@ Additionally, if you need large repositories or multiple forks for testing, cons
|
|||
|
||||
## How does it work?
|
||||
|
||||
The Elasticsearch integration depends on an external indexer. We ship an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a Rake task but, after this is done, GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [`/ee/app/models/concerns/elastic/application_versioned_search.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
|
||||
The Elasticsearch integration depends on an external indexer. We ship an
|
||||
[indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer).
|
||||
The user must trigger the initial indexing via a Rake task but, after this is done,
|
||||
GitLab itself will trigger reindexing when required via `after_` callbacks on create,
|
||||
update, and destroy that are inherited from
|
||||
[`/ee/app/models/concerns/elastic/application_versioned_search.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/elastic/application_versioned_search.rb).
|
||||
|
||||
After initial indexing is complete, create, update, and delete operations for all models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494)) are tracked in a Redis [`ZSET`](https://redis.io/docs/latest/develop/data-types/#sorted-sets). A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating many Elasticsearch documents at a time with the [Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
|
||||
After initial indexing is complete, create, update, and delete operations for all
|
||||
models except projects (see [#207494](https://gitlab.com/gitlab-org/gitlab/-/issues/207494))
|
||||
are tracked in a Redis [`ZSET`](https://redis.io/docs/latest/develop/data-types/#sorted-sets).
|
||||
A regular `sidekiq-cron` `ElasticIndexBulkCronWorker` processes this queue, updating
|
||||
many Elasticsearch documents at a time with the
|
||||
[Bulk Request API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html).
|
||||
|
||||
Search queries are generated by the concerns found in [`ee/app/models/concerns/elastic`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so pay close attention to them!
|
||||
Search queries are generated by the concerns found in
|
||||
[`ee/app/models/concerns/elastic`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/app/models/concerns/elastic).
|
||||
These concerns are also in charge of access control, and have been a historic
|
||||
source of security bugs so pay close attention to them!
|
||||
|
||||
### Custom routing
|
||||
|
||||
|
|
@ -54,7 +82,8 @@ during indexing and searching operations. Some of the benefits and tradeoffs to
|
|||
|
||||
## Existing analyzers and tokenizers
|
||||
|
||||
The following analyzers and tokenizers are defined in [`ee/lib/elastic/latest/config.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/elastic/latest/config.rb).
|
||||
The following analyzers and tokenizers are defined in
|
||||
[`ee/lib/elastic/latest/config.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/elastic/latest/config.rb).
|
||||
|
||||
### Analyzers
|
||||
|
||||
|
|
@ -72,7 +101,9 @@ See the `sha_tokenizer` explanation later below for an example.
|
|||
|
||||
#### `code_analyzer`
|
||||
|
||||
Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the [`word_delimiter_graph`](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html), `lowercase`, and `asciifolding` filters.
|
||||
Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer
|
||||
and the [`word_delimiter_graph`](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html),
|
||||
`lowercase`, and `asciifolding` filters.
|
||||
|
||||
The `whitespace` tokenizer was selected to have more control over how tokens are split. For example the string `Foo::bar(4)` needs to generate tokens like `Foo` and `bar(4)` to be properly searched.
|
||||
|
||||
|
|
@ -85,7 +116,9 @@ The [Elasticsearch `code_analyzer` doesn't account for all code cases](../integr
|
|||
|
||||
#### `sha_tokenizer`
|
||||
|
||||
This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searchable by any sub-set of it (minimum of 5 chars).
|
||||
This is a custom tokenizer that uses the
|
||||
[`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html)
|
||||
to allow SHAs to be searchable by any sub-set of it (minimum of 5 chars).
|
||||
|
||||
Example:
|
||||
|
||||
|
|
@ -100,7 +133,9 @@ Example:
|
|||
|
||||
#### `path_tokenizer`
|
||||
|
||||
This is a custom tokenizer that uses the [`path_hierarchy` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pathhierarchy-tokenizer.html) with `reverse: true` to allow searches to find paths no matter how much or how little of the path is given as input.
|
||||
This is a custom tokenizer that uses the
|
||||
[`path_hierarchy` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pathhierarchy-tokenizer.html)
|
||||
with `reverse: true` to allow searches to find paths no matter how much or how little of the path is given as input.
|
||||
|
||||
Example:
|
||||
|
||||
|
|
|
|||
|
|
@ -8,16 +8,25 @@ info: Any user with at least the Maintainer role can merge updates to this conte
|
|||
|
||||
## Utility Classes
|
||||
|
||||
In order to reduce the generation of more CSS as our site grows, prefer the use of utility classes over adding new CSS. In complex cases, CSS can be addressed by adding component classes.
|
||||
In order to reduce the generation of more CSS as our site grows, prefer the use
|
||||
of utility classes over adding new CSS. In complex cases, CSS can be addressed
|
||||
by adding component classes.
|
||||
|
||||
### Where are CSS utility classes defined?
|
||||
|
||||
Utility classes are generated by [Tailwind CSS](https://tailwindcss.com/). Use [Tailwind CSS autocomplete](#tailwind-css-autocomplete) or the [official Tailwind CSS documentation](#official-tailwind-css-documentation) to see available CSS utility classes.
|
||||
|
||||
There are also legacy CSS utility classes defined in `config/helpers/tailwind/css_in_js.js`. These CSS utility classes do not comply with Tailwind CSS naming conventions and will be [iteratively migrated](https://gitlab.com/groups/gitlab-org/-/epics/13521) to the Tailwind CSS equivalent. Please do not add new instances of these CSS utility classes, instead use the Tailwind CSS equivalent.
|
||||
There are also legacy CSS utility classes defined in
|
||||
`config/helpers/tailwind/css_in_js.js`. These CSS utility classes do not comply with
|
||||
Tailwind CSS naming conventions and will be
|
||||
[iteratively migrated](https://gitlab.com/groups/gitlab-org/-/epics/13521) to the
|
||||
Tailwind CSS equivalent. Please do not add new instances of these CSS utility
|
||||
classes, instead use the Tailwind CSS equivalent.
|
||||
|
||||
Classes in [`utilities.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/utilities.scss) and [`common.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/framework/common.scss) are being deprecated.
|
||||
Classes in [`common.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/framework/common.scss) that use non-design-system values should be avoided. Use classes with conforming values instead.
|
||||
Classes in [`utilities.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/utilities.scss)
|
||||
and [`common.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/framework/common.scss)
|
||||
are being deprecated. Classes in [`common.scss`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/assets/stylesheets/framework/common.scss)
|
||||
that use non-design-system values should be avoided. Use classes with conforming values instead.
|
||||
|
||||
Avoid [Bootstrap's Utility Classes](https://getbootstrap.com/docs/4.3/utilities/).
|
||||
|
||||
|
|
@ -37,39 +46,58 @@ and implementation details.
|
|||
|
||||
#### Tailwind CSS basics
|
||||
|
||||
Below are some Tailwind CSS basics and information about how it has been configured to use the [Pajamas design system](https://design.gitlab.com/). For a more in-depth guide see the [official Tailwind CSS documentation](https://tailwindcss.com/docs/utility-first).
|
||||
Below are some Tailwind CSS basics and information about how it has been
|
||||
configured to use the [Pajamas design system](https://design.gitlab.com/). For a
|
||||
more in-depth guide see the [official Tailwind CSS documentation](https://tailwindcss.com/docs/utility-first).
|
||||
|
||||
##### Prefix
|
||||
|
||||
We have configured Tailwind CSS to use a [prefix](https://tailwindcss.com/docs/configuration#prefix) so all utility classes are prefixed with `gl-`.
|
||||
When using responsive utilities or state modifiers the prefix goes after the colon.
|
||||
We have configured Tailwind CSS to use a
|
||||
[prefix](https://tailwindcss.com/docs/configuration#prefix) so all utility classes are prefixed with `gl-`.
|
||||
When using responsive utilities or state modifiers the prefix goes after the colon.
|
||||
|
||||
**Examples:** `gl-mt-5`, `lg:gl-mt-5`.
|
||||
|
||||
##### Responsive CSS utility classes
|
||||
|
||||
[Responsive CSS utility classes](https://tailwindcss.com/docs/responsive-design) are prefixed with the breakpoint name, followed by the `:` character.
|
||||
The available breakpoints are configured in [tailwind.defaults.js#L44](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js#L44)
|
||||
The available breakpoints are configured in [tailwind.defaults.js#L44](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js#L44)
|
||||
|
||||
**Example:** `lg:gl-mt-5`
|
||||
|
||||
##### Hover, focus, and other state modifiers
|
||||
|
||||
[State modifiers](https://tailwindcss.com/docs/hover-focus-and-other-states) can be used to conditionally apply any Tailwind CSS class. Prefix the CSS utility class with the name of the modifier, followed by the `:` character.
|
||||
[State modifiers](https://tailwindcss.com/docs/hover-focus-and-other-states)
|
||||
can be used to conditionally apply any Tailwind CSS class. Prefix the CSS utility class
|
||||
with the name of the modifier, followed by the `:` character.
|
||||
|
||||
**Example:** `hover:gl-underline`
|
||||
|
||||
##### `!important` modifier
|
||||
|
||||
You can use the [important modifier](https://tailwindcss.com/docs/configuration#important-modifier) by adding `!` to the beginning of the CSS utility class. When using in conjunction with responsive utility classes or state modifiers the `!` goes after the `:` character.
|
||||
You can use the [important modifier](https://tailwindcss.com/docs/configuration#important-modifier) by adding `!` to the beginning of the CSS utility class. When using in conjunction with responsive utility classes or state modifiers the `!` goes after the `:` character.
|
||||
|
||||
**Examples:** `!gl-mt-5`, `lg:!gl-mt-5`, `hover:!gl-underline`
|
||||
|
||||
##### Spacing and sizing CSS utility classes
|
||||
|
||||
Spacing and sizing CSS utility classes (e.g. `margin`, `padding`, `width`, `height`) use our spacing scale defined in
|
||||
[tailwind.defaults.js#L4](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js#L4). They will use the naming conventions documented in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation) but the scale will not match. When using the [Tailwind CSS autocomplete](#tailwind-css-autocomplete) our configured spacing scale will be shown.
|
||||
Color CSS utility classes (e.g. `color` and `background-color`) use colors defined in
|
||||
[src/tokens/build/tailwind/tokens.cjs](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/24a08b50da6bd3d34fb3f8d24f84436d90d165f6/src/tokens/build/tailwind/tokens.cjs).
|
||||
They will use the naming conventions documented in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation)
|
||||
but the color names will not match. When using the [Tailwind CSS autocomplete](#tailwind-css-autocomplete)
|
||||
our configured colors will be shown.
|
||||
|
||||
**Example:** `gl-mt-5` will be `margin-top: 1rem;`
|
||||
|
||||
##### Color CSS utility classes
|
||||
|
||||
Color CSS utility classes (e.g. `color` and `background-color`) use colors defined in [src/tokens/build/tailwind/tokens.cjs](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/24a08b50da6bd3d34fb3f8d24f84436d90d165f6/src/tokens/build/tailwind/tokens.cjs). They will use the naming conventions documented in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation) but the color names will not match. When using the [Tailwind CSS autocomplete](#tailwind-css-autocomplete) our configured colors will be shown.
|
||||
Color CSS utility classes (e.g. `color` and `background-color`) use colors defined in
|
||||
[src/tokens/build/tailwind/tokens.cjs](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/24a08b50da6bd3d34fb3f8d24f84436d90d165f6/src/tokens/build/tailwind/tokens.cjs).
|
||||
They will use the naming conventions documented in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation)
|
||||
but the color names will not match. When using the [Tailwind CSS autocomplete](#tailwind-css-autocomplete)
|
||||
our configured colors will be shown.
|
||||
|
||||
**Example:** `gl-text-red-500` will be `color: var(--red-500, #dd2b0e);`
|
||||
|
||||
#### Building the Tailwind CSS bundle
|
||||
|
|
@ -84,11 +112,15 @@ However the bundle gets built, the output is saved to `app/assets/builds/tailwin
|
|||
|
||||
#### Tailwind CSS autocomplete
|
||||
|
||||
Tailwind CSS autocomplete will list all available classes in your code editor. Keep in mind it will also list legacy CSS utilities. Unfortunately we don't have a way to mark the legacy CSS utility classes in the autocomplete so try to cross reference with the [official Tailwind CSS documentation](#official-tailwind-css-documentation) if you are unsure.
|
||||
Tailwind CSS autocomplete will list all available classes in your code editor.
|
||||
Keep in mind it will also list legacy CSS utilities. Unfortunately we don't have
|
||||
a way to mark the legacy CSS utility classes in the autocomplete so try to cross
|
||||
reference with the [official Tailwind CSS documentation](#official-tailwind-css-documentation) if you are unsure.
|
||||
|
||||
##### VS Code
|
||||
|
||||
Install the [Tailwind CSS IntelliSense](https://marketplace.visualstudio.com/items?itemName=bradlc.vscode-tailwindcss) extension. For best results and HAML and custom `*-class` prop support these are the recommended settings:
|
||||
Install the [Tailwind CSS IntelliSense](https://marketplace.visualstudio.com/items?itemName=bradlc.vscode-tailwindcss)
|
||||
extension. For best results and HAML and custom `*-class` prop support these are the recommended settings:
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
@ -129,11 +161,21 @@ For full HAML and custom `*-class` prop support these are the recommended update
|
|||
|
||||
#### Official Tailwind CSS documentation
|
||||
|
||||
GitLab defines its own Tailwind CSS config in [tailwind.defaults.js](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js) to match the Pajamas design system and to prefix CSS utility classes with `gl-`. This means that in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation) the spacing, sizing, and color CSS utility classes may not match. Also, the `gl-` prefix will not be shown. Here is our [spacing scale](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js#L4) and [colors](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/24a08b50da6bd3d34fb3f8d24f84436d90d165f6/src/tokens/build/tailwind/tokens.cjs). In the future we plan to utilize [Tailwind config viewer](https://github.com/rogden/tailwind-config-viewer) to have a Tailwind CSS documentation site specific to GitLab.
|
||||
GitLab defines its own Tailwind CSS config in [tailwind.defaults.js](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js)
|
||||
to match the Pajamas design system and to prefix CSS utility classes with `gl-`.
|
||||
This means that in the [official Tailwind CSS documentation](https://tailwindcss.com/docs/installation)
|
||||
the spacing, sizing, and color CSS utility classes may not match. Also, the `gl-`
|
||||
prefix will not be shown. Here is our
|
||||
[spacing scale](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/6612eaee37cdb4dd0258468c9f415be28c1053f0/tailwind.defaults.js#L4)
|
||||
and [colors](https://gitlab.com/gitlab-org/gitlab-ui/-/blob/24a08b50da6bd3d34fb3f8d24f84436d90d165f6/src/tokens/build/tailwind/tokens.cjs).
|
||||
In the future we plan to utilize [Tailwind config viewer](https://github.com/rogden/tailwind-config-viewer)
|
||||
to have a Tailwind CSS documentation site specific to GitLab.
|
||||
|
||||
### Where should you put new utility classes?
|
||||
|
||||
Utility classes are generated by [Tailwind CSS](https://tailwindcss.com/) which supports most CSS features. If there is something that is not available we should update `tailwind.defaults.js` in GitLab UI.
|
||||
Utility classes are generated by [Tailwind CSS](https://tailwindcss.com/) which
|
||||
supports most CSS features. If there is something that is not available we should
|
||||
update `tailwind.defaults.js` in GitLab UI.
|
||||
|
||||
### When should you create component classes?
|
||||
|
||||
|
|
@ -142,7 +184,10 @@ We recommend a "utility-first" approach.
|
|||
1. Start with utility classes.
|
||||
1. If composing utility classes into a component class removes code duplication and encapsulates a clear responsibility, do it.
|
||||
|
||||
This encourages an organic growth of component classes and prevents the creation of one-off non-reusable classes. Also, the kind of classes that emerge from "utility-first" tend to be design-centered (for example, `.button`, `.alert`, `.card`) rather than domain-centered (for example, `.security-report-widget`, `.commit-header-icon`).
|
||||
This encourages an organic growth of component classes and prevents the creation of
|
||||
one-off non-reusable classes. Also, the kind of classes that emerge from "utility-first"
|
||||
tend to be design-centered (for example, `.button`, `.alert`, `.card`) rather than
|
||||
domain-centered (for example, `.security-report-widget`, `.commit-header-icon`).
|
||||
|
||||
Inspiration:
|
||||
|
||||
|
|
@ -151,7 +196,12 @@ Inspiration:
|
|||
|
||||
### Utility mixins
|
||||
|
||||
We are currently in the process of [migrating to Tailwind](#tailwind-css). The migration removes utility mixins so please do not add any new usages of utility mixins. Instead, you can use the [`@apply` directive](https://tailwindcss.com/docs/reusing-styles#extracting-classes-with-apply) to add Tailwind styles to a CSS selector. `@apply` should be used for any CSS properties that are dependent on our design system (e.g. `margin`, `padding`). For CSS properties that are unit-less (e.g `display: flex`) it is okay to use CSS properties directly.
|
||||
We are currently in the process of [migrating to Tailwind](#tailwind-css). The migration
|
||||
removes utility mixins so please do not add any new usages of utility mixins.
|
||||
Instead, you can use the [`@apply` directive](https://tailwindcss.com/docs/reusing-styles#extracting-classes-with-apply)
|
||||
to add Tailwind styles to a CSS selector. `@apply` should be used for any CSS properties
|
||||
that are dependent on our design system (e.g. `margin`, `padding`). For CSS properties
|
||||
that are unit-less (e.g `display: flex`) it is okay to use CSS properties directly.
|
||||
|
||||
```scss
|
||||
// Bad
|
||||
|
|
@ -290,14 +340,19 @@ renaming without breaking styling.
|
|||
|
||||
## Using `extend` at-rule
|
||||
|
||||
Usage of the `extend` at-rule is prohibited due to [memory leaks](https://gitlab.com/gitlab-org/gitlab/-/issues/323021) and [the rule doesn't work as it should](https://sass-lang.com/documentation/breaking-changes/extend-compound/).
|
||||
Usage of the `extend` at-rule is prohibited due to
|
||||
[memory leaks](https://gitlab.com/gitlab-org/gitlab/-/issues/323021) and
|
||||
[the rule doesn't work as it should](https://sass-lang.com/documentation/breaking-changes/extend-compound/).
|
||||
|
||||
## Linting
|
||||
|
||||
We use [stylelint](https://stylelint.io) to check for style guide conformity. It uses the
|
||||
ruleset in `.stylelintrc` and rules from [our SCSS configuration](https://gitlab.com/gitlab-org/frontend/gitlab-stylelint-config). `.stylelintrc` is located in the home directory of the project.
|
||||
ruleset in `.stylelintrc` and rules from
|
||||
[our SCSS configuration](https://gitlab.com/gitlab-org/frontend/gitlab-stylelint-config).
|
||||
`.stylelintrc` is located in the home directory of the project.
|
||||
|
||||
To check if any warnings are produced by your changes, run `yarn lint:stylelint` in the GitLab directory. Stylelint also runs in GitLab CI/CD to
|
||||
To check if any warnings are produced by your changes, run `yarn lint:stylelint`
|
||||
in the GitLab directory. Stylelint also runs in GitLab CI/CD to
|
||||
catch any warnings.
|
||||
|
||||
If the Rake task is throwing warnings you don't understand, SCSS Lint's
|
||||
|
|
|
|||
|
|
@ -110,7 +110,8 @@ projects:
|
|||
### Automatic linting
|
||||
|
||||
WARNING:
|
||||
The use of `registry.gitlab.com/gitlab-org/gitlab-build-images:golangci-lint-alpine` has been [deprecated as of 16.10](https://gitlab.com/gitlab-org/gitlab-build-images/-/issues/131).
|
||||
The use of `registry.gitlab.com/gitlab-org/gitlab-build-images:golangci-lint-alpine` has been
|
||||
[deprecated as of 16.10](https://gitlab.com/gitlab-org/gitlab-build-images/-/issues/131).
|
||||
|
||||
Use the upstream version of [golangci-lint](https://golangci-lint.run/).
|
||||
See the list of linters [enabled/disabled by default](https://golangci-lint.run/usage/linters/#enabled-by-default).
|
||||
|
|
@ -143,7 +144,8 @@ Once [recursive includes](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/568
|
|||
become available, you can share job templates like this
|
||||
[analyzer](https://gitlab.com/gitlab-org/security-products/ci-templates/raw/master/includes-dev/analyzer.yml).
|
||||
|
||||
Go GitLab linter plugins are maintained in the [`gitlab-org/language-tools/go/linters`](https://gitlab.com/gitlab-org/language-tools/go/linters/) namespace.
|
||||
Go GitLab linter plugins are maintained in the
|
||||
[`gitlab-org/language-tools/go/linters`](https://gitlab.com/gitlab-org/language-tools/go/linters/) namespace.
|
||||
|
||||
### Help text style guide
|
||||
|
||||
|
|
@ -487,13 +489,29 @@ golanci-lint rule automatically check for this.
|
|||
|
||||
### Analyzer Tests
|
||||
|
||||
The conventional Secure [analyzer](https://gitlab.com/gitlab-org/security-products/analyzers/) has a [`convert` function](https://gitlab.com/gitlab-org/security-products/analyzers/command/-/blob/main/convert.go#L15-17) that converts SAST/DAST scanner reports into [GitLab Security Reports](https://gitlab.com/gitlab-org/security-products/security-report-schemas). When writing tests for the `convert` function, we should make use of [test fixtures](https://dave.cheney.net/2016/05/10/test-fixtures-in-go) using a `testdata` directory at the root of the analyzer's repository. The `testdata` directory should contain two subdirectories: `expect` and `reports`. The `reports` directory should contain sample SAST/DAST scanner reports which are passed into the `convert` function during the test setup. The `expect` directory should contain the expected GitLab Security Report that the `convert` returns. See Secret Detection for an [example](https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/160424589ef1eed7b91b59484e019095bc7233bd/convert_test.go#L13-66).
|
||||
The conventional Secure [analyzer](https://gitlab.com/gitlab-org/security-products/analyzers/) has a
|
||||
[`convert` function](https://gitlab.com/gitlab-org/security-products/analyzers/command/-/blob/main/convert.go#L15-17)
|
||||
that converts SAST/DAST scanner reports into
|
||||
[GitLab Security Reports](https://gitlab.com/gitlab-org/security-products/security-report-schemas).
|
||||
When writing tests for the `convert` function, we should make use of
|
||||
[test fixtures](https://dave.cheney.net/2016/05/10/test-fixtures-in-go) using a `testdata`
|
||||
directory at the root of the analyzer's repository. The `testdata` directory should
|
||||
contain two subdirectories: `expect` and `reports`. The `reports` directory should
|
||||
contain sample SAST/DAST scanner reports which are passed into the `convert` function
|
||||
during the test setup. The `expect` directory should contain the expected GitLab Security Report
|
||||
that the `convert` returns. See Secret Detection for an
|
||||
[example](https://gitlab.com/gitlab-org/security-products/analyzers/secrets/-/blob/160424589ef1eed7b91b59484e019095bc7233bd/convert_test.go#L13-66).
|
||||
|
||||
If the scanner report is small, less than 35 lines, then feel free to [inline the report](https://gitlab.com/gitlab-org/security-products/analyzers/sobelow/-/blob/8bd2428a/convert/convert_test.go#L13-77) rather than use a `testdata` directory.
|
||||
If the scanner report is small, less than 35 lines, then feel free to
|
||||
[inline the report](https://gitlab.com/gitlab-org/security-products/analyzers/sobelow/-/blob/8bd2428a/convert/convert_test.go#L13-77)
|
||||
rather than use a `testdata` directory.
|
||||
|
||||
#### Test Diffs
|
||||
|
||||
The [go-cmp](https://github.com/google/go-cmp) package should be used when comparing large structs in tests. It makes it possible to output a specific diff where the two structs differ, rather than seeing the whole of both structs printed out in the test logs. Here is a small example:
|
||||
The [go-cmp](https://github.com/google/go-cmp) package should be used when
|
||||
comparing large structs in tests. It makes it possible to output a specific diff
|
||||
where the two structs differ, rather than seeing the whole of both structs
|
||||
printed out in the test logs. Here is a small example:
|
||||
|
||||
```go
|
||||
package main
|
||||
|
|
@ -543,7 +561,9 @@ func TestHelloWorld(t *testing.T) {
|
|||
}
|
||||
```
|
||||
|
||||
The output demonstrates why `go-cmp` is far superior when comparing large structs. Even though you could spot the difference with this small difference, it quickly gets unwieldy as the data grows.
|
||||
The output demonstrates why `go-cmp` is far superior when comparing large
|
||||
structs. Even though you could spot the difference with this small difference,
|
||||
it quickly gets unwieldy as the data grows.
|
||||
|
||||
```plaintext
|
||||
main_test.go:36: reflect comparison:
|
||||
|
|
|
|||
|
|
@ -7,30 +7,90 @@ info: To determine the technical writer assigned to the Stage/Group associated w
|
|||
|
||||
# Corporate contributor license agreement
|
||||
|
||||
You accept and agree to the following terms and conditions for Your present and future Contributions submitted to GitLab B.V.. Except for the license granted herein to GitLab B.V. and recipients of software distributed by GitLab B.V., You reserve all right, title, and interest in and to Your Contributions.
|
||||
You accept and agree to the following terms and conditions for Your present and
|
||||
future Contributions submitted to GitLab B.V.. Except for the license granted
|
||||
herein to GitLab B.V. and recipients of software distributed by GitLab B.V., You
|
||||
reserve all right, title, and interest in and to Your Contributions.
|
||||
|
||||
"1." **Definitions:**
|
||||
|
||||
"You" (or "Your") shall mean the copyright owner or legal entity authorized by the copyright owner that is making this Agreement with GitLab B.V.. For legal entities, the entity making a Contribution and all other entities that control, are controlled by, or are under common control with that entity are considered to be a single Contributor. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
"You" (or "Your") shall mean the copyright owner or legal entity authorized by
|
||||
the copyright owner that is making this Agreement with GitLab B.V.. For legal
|
||||
entities, the entity making a Contribution and all other entities that
|
||||
control, are controlled by, or are under common control with that entity are
|
||||
considered to be a single Contributor. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the direction or
|
||||
management of such entity, whether by contract or otherwise, or (ii) ownership
|
||||
of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial
|
||||
ownership of such entity.
|
||||
|
||||
"Contribution" shall mean the code, documentation or other original works of authorship, including any modifications or additions to an existing work, that is submitted by You to GitLab B.V. for inclusion in, or documentation of, any of the products owned or managed by GitLab B.V. (the "Work"). For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to GitLab B.V. or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, GitLab B.V. for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
|
||||
"Contribution" shall mean the code, documentation or other original works of
|
||||
authorship, including any modifications or additions to an existing work, that
|
||||
is submitted by You to GitLab B.V. for inclusion in, or documentation of, any
|
||||
of the products owned or managed by GitLab B.V. (the "Work"). For the purposes
|
||||
of this definition, "submitted" means any form of electronic, verbal, or
|
||||
written communication sent to GitLab B.V. or its representatives, including
|
||||
but not limited to communication on electronic mailing lists, source code
|
||||
control systems, and issue tracking systems that are managed by, or on behalf
|
||||
of, GitLab B.V. for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise designated
|
||||
in writing by You as "Not a Contribution."
|
||||
|
||||
"2." **Grant of Copyright License:**
|
||||
|
||||
Subject to the terms and conditions of this Agreement, You hereby grant to GitLab B.V. and to recipients of software distributed by GitLab B.V. a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.
|
||||
Subject to the terms and conditions of this Agreement, You hereby grant to
|
||||
GitLab B.V. and to recipients of software distributed by GitLab B.V. a
|
||||
perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare derivative works of, publicly display,
|
||||
publicly perform, sublicense, and distribute Your Contributions and such
|
||||
derivative works.
|
||||
|
||||
"3." **Grant of Patent License:**
|
||||
|
||||
Subject to the terms and conditions of this Agreement, You hereby grant to GitLab B.V. and to recipients of software distributed by GitLab B.V. a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by You that are necessarily infringed by Your Contribution(s) alone or by combination of Your Contribution(s) with the Work to which such Contribution(s) was submitted. If any entity institutes patent litigation against You or any other entity (including a cross-claim or counterclaim in a lawsuit) alleging that your Contribution, or the Work to which you have contributed, constitutes direct or contributory patent infringement, then any patent licenses granted to that entity under this Agreement for that Contribution or Work shall terminate as of the date such litigation is filed.
|
||||
Subject to the terms and conditions of this Agreement, You hereby grant to
|
||||
GitLab B.V. and to recipients of software distributed by GitLab B.V. a
|
||||
perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made, use,
|
||||
offer to sell, sell, import, and otherwise transfer the Work, where such
|
||||
license applies only to those patent claims licensable by You that are
|
||||
necessarily infringed by Your Contribution(s) alone or by combination of Your
|
||||
Contribution(s) with the Work to which such Contribution(s) was submitted. If
|
||||
any entity institutes patent litigation against You or any other entity
|
||||
(including a cross-claim or counterclaim in a lawsuit) alleging that your
|
||||
Contribution, or the Work to which you have contributed, constitutes direct or
|
||||
contributory patent infringement, then any patent licenses granted to that
|
||||
entity under this Agreement for that Contribution or Work shall terminate as
|
||||
of the date such litigation is filed.
|
||||
|
||||
You represent that You are legally entitled to grant the above license. You represent further that each of Your employees is authorized to submit Contributions on Your behalf, but excluding employees that are designated in writing by You as "Not authorized to submit Contributions on behalf of (name of Your corporation here)." Such designations of exclusion for unauthorized employees are to be submitted via email to `legal@gitlab.com`. It is Your responsibility to notify GitLab B.V. when any change is required to the list of designated employees excluded from submitting Contributions on Your behalf. Such notification should also be sent via email to `legal@gitlab.com`.
|
||||
You represent that You are legally entitled to grant the above license. You
|
||||
represent further that each of Your employees is authorized to submit
|
||||
Contributions on Your behalf, but excluding employees that are designated in
|
||||
writing by You as "Not authorized to submit Contributions on behalf of (name
|
||||
of Your corporation here)." Such designations of exclusion for unauthorized
|
||||
employees are to be submitted via email to `legal@gitlab.com`. It is Your
|
||||
responsibility to notify GitLab B.V. when any change is required to the list
|
||||
of designated employees excluded from submitting Contributions on Your behalf.
|
||||
Such notification should also be sent via email to `legal@gitlab.com`.
|
||||
|
||||
"4." **Contributions:**
|
||||
|
||||
You represent that each of Your Contributions is Your original creation.
|
||||
|
||||
Should You wish to submit work that is not Your original creation, You may submit it to GitLab B.V. separately from any Contribution, identifying the complete details of its source and of any license or other restriction (including, but not limited to, related patents, trademarks, and license agreements) of which you are personally aware, and conspicuously marking the work as "Submitted on behalf of a third-party: (named here)".
|
||||
Should You wish to submit work that is not Your original creation, You may
|
||||
submit it to GitLab B.V. separately from any Contribution, identifying the
|
||||
complete details of its source and of any license or other restriction
|
||||
(including, but not limited to, related patents, trademarks, and license
|
||||
agreements) of which you are personally aware, and conspicuously marking the
|
||||
work as "Submitted on behalf of a third-party: (named here)".
|
||||
|
||||
You are not expected to provide support for Your Contributions, except to the extent You desire to provide support. You may provide support for free, for a fee, or not at all. Unless required by applicable law or agreed to in writing, You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
You are not expected to provide support for Your Contributions, except to the
|
||||
extent You desire to provide support. You may provide support for free, for a
|
||||
fee, or not at all. Unless required by applicable law or agreed to in writing,
|
||||
You provide Your Contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR
|
||||
CONDITIONS OF ANY KIND, either express or implied, including, without
|
||||
limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
|
||||
MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
|
||||
This text is licensed under the [Creative Commons Attribution 3.0 License](https://creativecommons.org/licenses/by/3.0/) and the original source is the Google Open Source Programs Office.
|
||||
This text is licensed under the
|
||||
[Creative Commons Attribution 3.0 License](https://creativecommons.org/licenses/by/3.0/)
|
||||
and the original source is the Google Open Source Programs Office.
|
||||
|
|
|
|||
|
|
@ -37,7 +37,15 @@ These integrations have to do with using GitLab to build application workloads a
|
|||
|
||||
[12/28/2023 AWS Release Announcement for Self-Managed / Dedicated](https://aws.amazon.com/about-aws/whats-new/2023/12/codepipeline-gitlab-self-managed/)
|
||||
|
||||
**AWS CodeStar Connections** - enables SCM connections to multiple AWS Services. [Configure GitLab](https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-gitlab.html). [Supported Providers](https://docs.aws.amazon.com/dtconsole/latest/userguide/supported-versions-connections.html). [Supported AWS Services](https://docs.aws.amazon.com/dtconsole/latest/userguide/integrations-connections.html) - each one may have to make updates to support GitLab, so here is the subset that support GitLab. This works with GitLab.com SaaS, GitLab Self-Managed and GitLab Dedicated. AWS CodeStar connections are not available in all AWS regions - the exclusion list is [documented here](https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CodestarConnectionSource.html). ([12/28/2023](https://aws.amazon.com/about-aws/whats-new/2023/12/codepipeline-gitlab-self-managed/)) `[AWS Built]`
|
||||
**AWS CodeStar Connections** - enables SCM connections to multiple AWS Services.
|
||||
[Configure GitLab](https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create-gitlab.html).
|
||||
[Supported Providers](https://docs.aws.amazon.com/dtconsole/latest/userguide/supported-versions-connections.html).
|
||||
[Supported AWS Services](https://docs.aws.amazon.com/dtconsole/latest/userguide/integrations-connections.html) -
|
||||
each one may have to make updates to support GitLab, so here is the subset that
|
||||
support GitLab. This works with GitLab.com SaaS, GitLab Self-Managed and GitLab Dedicated.
|
||||
AWS CodeStar connections are not available in all AWS regions - the exclusion list is
|
||||
[documented here](https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CodestarConnectionSource.html).
|
||||
([12/28/2023](https://aws.amazon.com/about-aws/whats-new/2023/12/codepipeline-gitlab-self-managed/)) `[AWS Built]`
|
||||
|
||||
[Video Explanation of AWS CodeStar Connection Integration for AWS (1 min)](https://youtu.be/f7qTSa_bNig)
|
||||
|
||||
|
|
|
|||
|
|
@ -106,7 +106,23 @@ As a single-tenant SaaS solution, GitLab Dedicated provides infrastructure-level
|
|||
|
||||
#### Access controls
|
||||
|
||||
GitLab Dedicated adheres to the [principle of least privilege](https://handbook.gitlab.com/handbook/security/access-management-policy/#principle-of-least-privilege) to control access to customer tenant environments. Tenant AWS accounts live under a top-level GitLab Dedicated AWS parent organization. Access to the AWS Organization is restricted to select GitLab team members. All user accounts within the AWS Organization follow the overall [GitLab Access Management Policy](https://handbook.gitlab.com/handbook/security/access-management-policy/). Direct access to customer tenant environments is restricted to a single Hub account. The GitLab Dedicated Control Plane uses the Hub account to perform automated actions over tenant accounts when managing environments. Similarly, GitLab Dedicated engineers do not have direct access to customer tenant environments. In [break glass](https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/blob/main/engineering/breaking_glass.md) situations, where access to resources in the tenant environment is required to address a high-severity issue, GitLab engineers must go through the Hub account to manage those resources. This is done via an approval process, and after permission is granted, the engineer will assume an IAM role on a temporary basis to access tenant resources through the Hub account. All actions within the hub account and tenant account are logged to CloudTrail.
|
||||
GitLab Dedicated adheres to the
|
||||
[principle of least privilege](https://handbook.gitlab.com/handbook/security/access-management-policy/#principle-of-least-privilege)
|
||||
to control access to customer tenant environments. Tenant AWS accounts live under
|
||||
a top-level GitLab Dedicated AWS parent organization. Access to the AWS Organization
|
||||
is restricted to select GitLab team members. All user accounts within the AWS Organization
|
||||
follow the overall [GitLab Access Management Policy](https://handbook.gitlab.com/handbook/security/access-management-policy/).
|
||||
Direct access to customer tenant environments is restricted to a single Hub account.
|
||||
The GitLab Dedicated Control Plane uses the Hub account to perform automated actions
|
||||
over tenant accounts when managing environments. Similarly, GitLab Dedicated engineers
|
||||
do not have direct access to customer tenant environments.
|
||||
In [break glass](https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/blob/main/engineering/breaking_glass.md)
|
||||
situations, where access to resources in the tenant environment is required to
|
||||
address a high-severity issue, GitLab engineers must go through the Hub account
|
||||
to manage those resources. This is done via an approval process, and after permission
|
||||
is granted, the engineer will assume an IAM role on a temporary basis to access
|
||||
tenant resources through the Hub account. All actions within the hub account and
|
||||
tenant account are logged to CloudTrail.
|
||||
|
||||
Inside tenant accounts, GitLab leverages Intrusion Detection and Malware Scanning capabilities from AWS GuardDuty. Infrastructure logs are monitored by the GitLab Security Incident Response Team to detect anomalous events.
|
||||
|
||||
|
|
@ -220,7 +236,7 @@ The following GitLab application features are not available:
|
|||
- View the [list of AI features to see which ones are supported](../../user/ai_features.md).
|
||||
- Refer to our [direction page](https://about.gitlab.com/direction/saas-platforms/dedicated/#supporting-ai-features-on-gitlab-dedicated) for more information.
|
||||
- Features other than [available features](#available-features) that must be configured outside of the GitLab user interface
|
||||
- Any functionality or feature behind a Feature Flag that is toggled `off` by default.
|
||||
- Any functionality or feature behind a Feature Flag that is toggled `off` by default.
|
||||
|
||||
The following features will not be supported:
|
||||
|
||||
|
|
|
|||
|
|
@ -4432,16 +4432,34 @@ Administrators who need to add runners for multiple projects can register a runn
|
|||
- Removal in GitLab <span class="milestone">15.0</span> ([breaking change](https://docs.gitlab.com/ee/update/terminology.html#breaking-change))
|
||||
</div>
|
||||
|
||||
All functionality related to GitLab's Container Network Security and Container Host Security categories is deprecated in GitLab 14.8 and scheduled for removal in GitLab 15.0. Users who need a replacement for this functionality are encouraged to evaluate the following open source projects as potential solutions that can be installed and managed outside of GitLab: [AppArmor](https://gitlab.com/apparmor/apparmor), [Cilium](https://github.com/cilium/cilium), [Falco](https://github.com/falcosecurity/falco), [FluentD](https://github.com/fluent/fluentd), [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/). To integrate these technologies into GitLab, add the desired Helm charts into your copy of the [Cluster Management Project Template](https://docs.gitlab.com/ee/user/clusters/management_project_template.html). Deploy these Helm charts in production by calling commands through GitLab [CI/CD](https://docs.gitlab.com/ee/user/clusters/agent/ci_cd_workflow.html).
|
||||
All functionality related to GitLab's Container Network Security and
|
||||
Container Host Security categories is deprecated in GitLab 14.8 and
|
||||
scheduled for removal in GitLab 15.0. Users who need a replacement for this
|
||||
functionality are encouraged to evaluate the following open source projects
|
||||
as potential solutions that can be installed and managed outside of GitLab:
|
||||
[AppArmor](https://gitlab.com/apparmor/apparmor),
|
||||
[Cilium](https://github.com/cilium/cilium),
|
||||
[Falco](https://github.com/falcosecurity/falco),
|
||||
[FluentD](https://github.com/fluent/fluentd),
|
||||
[Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/).
|
||||
|
||||
As part of this change, the following specific capabilities within GitLab are now deprecated, and are scheduled for removal in GitLab 15.0:
|
||||
To integrate these technologies into GitLab, add the desired Helm charts
|
||||
into your copy of the
|
||||
[Cluster Management Project Template](https://docs.gitlab.com/ee/user/clusters/management_project_template.html).
|
||||
Deploy these Helm charts in production by calling commands through GitLab
|
||||
[CI/CD](https://docs.gitlab.com/ee/user/clusters/agent/ci_cd_workflow.html).
|
||||
|
||||
As part of this change, the following specific capabilities within GitLab
|
||||
are now deprecated, and are scheduled for removal in GitLab 15.0:
|
||||
|
||||
- The **Security & Compliance > Threat Monitoring** page.
|
||||
- The `Network Policy` security policy type, as found on the **Security & Compliance > Policies** page.
|
||||
- The ability to manage integrations with the following technologies through GitLab: AppArmor, Cilium, Falco, FluentD, and Pod Security Policies.
|
||||
- All APIs related to the above functionality.
|
||||
|
||||
For additional context, or to provide feedback regarding this change, please reference our open [deprecation issue](https://gitlab.com/groups/gitlab-org/-/epics/7476).
|
||||
For additional context, or to provide feedback regarding this change,
|
||||
please reference our open
|
||||
[deprecation issue](https://gitlab.com/groups/gitlab-org/-/epics/7476).
|
||||
|
||||
</div>
|
||||
|
||||
|
|
|
|||
|
|
@ -164,7 +164,17 @@ apifuzzer_v2:
|
|||
|
||||
In the case of one or two slow operations, the team might decide to skip testing the operations, or exclude them from feature branch tests, but include them for default branch tests. Excluding the operation is done using the `FUZZAPI_EXCLUDE_PATHS` configuration [variable as explained in this section.](configuration/customizing_analyzer_settings.md#exclude-paths)
|
||||
|
||||
In this example, we have an operation that returns a large amount of data. The operation is `GET http://target:7777/api/large_response_json`. To exclude it we provide the `FUZZAPI_EXCLUDE_PATHS` configuration variable with the path portion of our operation URL `/api/large_response_json`. Our configuration disables the main `apifuzzer_fuzz` job and creates two new jobs `apifuzzer_main` and `apifuzzer_branch`. The `apifuzzer_branch` is set up to exclude the long operation and only run on non-default branches (for example, feature branches). The `apifuzzer_main` branch is set up to only execute on the default branch (`main` in this example). The `apifuzzer_branch` jobs run faster, allowing for quick development cycles, while the `apifuzzer_main` job which only runs on default branch builds, takes longer to run.
|
||||
In this example, we have an operation that returns a large amount of data. The
|
||||
operation is `GET http://target:7777/api/large_response_json`. To exclude it we
|
||||
provide the `FUZZAPI_EXCLUDE_PATHS` configuration variable with the path portion
|
||||
of our operation URL `/api/large_response_json`. Our configuration disables the
|
||||
main `apifuzzer_fuzz` job and creates two new jobs `apifuzzer_main` and
|
||||
`apifuzzer_branch`. The `apifuzzer_branch` is set up to exclude the long
|
||||
operation and only run on non-default branches (for example, feature branches).
|
||||
The `apifuzzer_main` branch is set up to only execute on the default branch
|
||||
(`main` in this example). The `apifuzzer_branch` jobs run faster, allowing for
|
||||
quick development cycles, while the `apifuzzer_main` job which only runs on
|
||||
default branch builds, takes longer to run.
|
||||
|
||||
To verify the operation is excluded, run the API Fuzzing job and review the job console output. It includes a list of included and excluded operations at the end of the test.
|
||||
|
||||
|
|
|
|||
|
|
@ -12,9 +12,23 @@ Sensitive information disclosure check. This includes credit card numbers, healt
|
|||
|
||||
## Remediation
|
||||
|
||||
Sensitive information leakage is an application weakness where an application reveals sensitive, user-specific data. Sensitive data may be used by an attacker to exploit its users. Therefore, leakage of sensitive data should be limited or prevented whenever possible. Information Leakage, in its most common form, is the result of differences in page responses for valid versus invalid data.
|
||||
Sensitive information leakage is an application weakness where an application
|
||||
reveals sensitive, user-specific data. Sensitive data may be used by an attacker
|
||||
to exploit its users. Therefore, leakage of sensitive data should be limited or
|
||||
prevented whenever possible. Information Leakage, in its most common form,
|
||||
is the result of differences in page responses for valid versus invalid data.
|
||||
|
||||
Pages that provide different responses based on the validity of the data can also lead to Information Leakage; specifically when data deemed confidential is being revealed as a result of the web application's design. Examples of sensitive data includes (but is not limited to): account numbers, user identifiers (Drivers license number, Passport number, Social Security Numbers, etc.) and user-specific information (passwords, sessions, addresses). Information Leakage in this context deals with exposure of key user data deemed confidential, or secret, that should not be exposed in plain view, even to the user. Credit card numbers and other heavily regulated information are prime examples of user data that needs to be further protected from exposure or leakage even with proper encryption and access controls already in place.
|
||||
Pages that provide different responses based on the validity of the data can
|
||||
also lead to Information Leakage; specifically when data deemed confidential is
|
||||
being revealed as a result of the web application's design. Examples of
|
||||
sensitive data includes (but is not limited to): account numbers, user
|
||||
identifiers (Drivers license number, Passport number, Social Security Numbers,
|
||||
etc.) and user-specific information (passwords, sessions, addresses).
|
||||
Information Leakage in this context deals with exposure of key user data deemed
|
||||
confidential, or secret, that should not be exposed in plain view, even to the
|
||||
user. Credit card numbers and other heavily regulated information are prime
|
||||
examples of user data that needs to be further protected from exposure or
|
||||
leakage even with proper encryption and access controls already in place.
|
||||
|
||||
## Links
|
||||
|
||||
|
|
|
|||
|
|
@ -12,14 +12,40 @@ Verify session cookie has correct flags and expiration.
|
|||
|
||||
## Remediation
|
||||
|
||||
Since HTTP is a stateless protocol, web sites commonly use cookies to store session IDs that uniquely identify a user from request to request. Consequently, each session ID's confidentiality must be maintained in order to prevent multiple users from accessing the same account. A stolen session ID can be used to view another user's account or perform a fraudulent
|
||||
transaction.
|
||||
Since HTTP is a stateless protocol, web sites commonly use cookies to store
|
||||
session IDs that uniquely identify a user from request to request. Consequently,
|
||||
each session ID's confidentiality must be maintained in order to prevent
|
||||
multiple users from accessing the same account. A stolen session ID can be used
|
||||
to view another user's account or perform a fraudulent transaction.
|
||||
|
||||
- One part of securing session ID's is to property mark them to expire and also require the correct set of flags to ensure they are not transmitted in the clear or accessible from scripting.
|
||||
- HttpOnly is an additional flag included in a Set-Cookie HTTP response header. Using the HttpOnly flag when generating a cookie helps mitigate the risk of client side script accessing the protected cookie (if the browser supports it). If the HttpOnly flag (optional) is included in the HTTP response header, the cookie cannot be accessed through client side script (again if the browser supports this flag). As a result, even if a cross-site scripting (XSS) flaw exists, and a user accidentally accesses a link that exploits this flaw, the browser will not reveal the cookie to a third party.
|
||||
- The Secure attribute for sensitive cookies in HTTPS sessions is not set, which could cause the user agent to send those cookies in plaintext over an HTTP session.
|
||||
- A session related cookie was identified being used on an insecure transport protocol. Insecure transport protocols are those that do not make use of SSL/TLS to secure the connection. Examples of such protocols are 'http'.
|
||||
- Insufficient Session Expiration occurs when a Web application permits an attacker to reuse old session credentials or session IDs for authorization. Insufficient Session Expiration increases a website's exposure to attacks that steal or reuse user's session identifiers. Since HTTP is a stateless protocol, websites commonly use cookies to store session IDs that uniquely identify a user from request to request. Consequently, each session ID's confidentiality must be maintained in order to prevent multiple users from accessing the same account. A stolen session ID can be used to view another user's account or perform a fraudulent transaction. One part of securing session ID's is to property mark them to expire and also require the correct set of flags to ensure they are not transmitted in the clear or accessible from scripting.
|
||||
- One part of securing session ID's is to property mark them to expire and also
|
||||
require the correct set of flags to ensure they are not transmitted in the
|
||||
clear or accessible from scripting.
|
||||
- HttpOnly is an additional flag included in a Set-Cookie HTTP response header.
|
||||
Using the HttpOnly flag when generating a cookie helps mitigate the risk of
|
||||
client side script accessing the protected cookie (if the browser supports it).
|
||||
If the HttpOnly flag (optional) is included in the HTTP response header,
|
||||
the cookie cannot be accessed through client side script (again if the browser
|
||||
supports this flag). As a result, even if a cross-site scripting (XSS) flaw
|
||||
exists, and a user accidentally accesses a link that exploits this flaw, the
|
||||
browser will not reveal the cookie to a third party.
|
||||
- The Secure attribute for sensitive cookies in HTTPS sessions is not set, which
|
||||
could cause the user agent to send those cookies in plaintext over an HTTP
|
||||
session.
|
||||
- A session related cookie was identified being used on an insecure transport
|
||||
protocol. Insecure transport protocols are those that do not make use of
|
||||
SSL/TLS to secure the connection. Examples of such protocols are 'http'.
|
||||
- Insufficient Session Expiration occurs when a Web application permits an
|
||||
attacker to reuse old session credentials or session IDs for authorization.
|
||||
Insufficient Session Expiration increases a website's exposure to attacks that
|
||||
steal or reuse user's session identifiers. Since HTTP is a stateless protocol,
|
||||
websites commonly use cookies to store session IDs that uniquely identify a
|
||||
user from request to request. Consequently, each session ID's confidentiality
|
||||
must be maintained in order to prevent multiple users from accessing the same
|
||||
account. A stolen session ID can be used to view another user's account or
|
||||
perform a fraudulent transaction. One part of securing session ID's is to
|
||||
property mark them to expire and also require the correct set of flags to
|
||||
ensure they are not transmitted in the clear or accessible from scripting.
|
||||
|
||||
## Links
|
||||
|
||||
|
|
|
|||
|
|
@ -8,15 +8,38 @@ info: To determine the technical writer assigned to the Stage/Group associated w
|
|||
|
||||
## Description
|
||||
|
||||
Check for SQL and NoSQL injection vulnerabilities. A SQL injection attack consists of insertion or "injection" of a SQL query via the input data from the client to the application. A successful SQL injection exploit can read sensitive data from the database, modify database data (Insert/Update/Delete), execute administration operations on the database (such as shutdown the DBMS), recover the content of a given file present on the DBMS file system and in some cases issue commands to the operating system. SQL injection attacks are a type of injection attack, in which SQL commands are injected into data-plane input in order to effect the execution of predefined SQL commands. This check modifies parameters in the request (path, query string, headers, JSON, XML, etc.) to try and create a syntax error in the SQL or NoSQL query. Logs and responses are then analyzed to try and detect if an error occured. If an error is detected there is a high likelihood that a vulnerability exists.
|
||||
Check for SQL and NoSQL injection vulnerabilities. A SQL injection attack
|
||||
consists of insertion or "injection" of a SQL query via the input data from the
|
||||
client to the application. A successful SQL injection exploit can read sensitive
|
||||
data from the database, modify database data (Insert/Update/Delete), execute
|
||||
administration operations on the database (such as shutdown the DBMS), recover
|
||||
the content of a given file present on the DBMS file system and in some cases
|
||||
issue commands to the operating system. SQL injection attacks are a type of
|
||||
injection attack, in which SQL commands are injected into data-plane input in
|
||||
order to effect the execution of predefined SQL commands. This check modifies
|
||||
parameters in the request (path, query string, headers, JSON, XML, etc.) to try
|
||||
and create a syntax error in the SQL or NoSQL query. Logs and responses are then
|
||||
analyzed to try and detect if an error occured. If an error is detected there is
|
||||
a high likelihood that a vulnerability exists.
|
||||
|
||||
## Remediation
|
||||
|
||||
The software constructs all or part of an SQL command using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the intended SQL command when it is sent to a downstream component.
|
||||
The software constructs all or part of an SQL command using
|
||||
externally-influenced input from an upstream component, but it does not
|
||||
neutralize or incorrectly neutralizes special elements that could modify the
|
||||
intended SQL command when it is sent to a downstream component.
|
||||
|
||||
Without sufficient removal or quoting of SQL syntax in user-controllable inputs, the generated SQL query can cause those inputs to be interpreted as SQL instead of ordinary user data. This can be used to alter query logic to bypass security checks, or to insert additional statements that modify the back-end database, possibly including execution of system commands.
|
||||
Without sufficient removal or quoting of SQL syntax in user-controllable inputs,
|
||||
the generated SQL query can cause those inputs to be interpreted as SQL instead
|
||||
of ordinary user data. This can be used to alter query logic to bypass security
|
||||
checks, or to insert additional statements that modify the back-end database,
|
||||
possibly including execution of system commands.
|
||||
|
||||
SQL injection has become a common issue with database-driven websites. The flaw is easily detected, and easily exploited, and as such, any site or software package with even a minimal user base is likely to be subject to an attempted attack of this kind. This flaw depends on the fact that SQL makes no real distinction between the control and data planes.
|
||||
SQL injection has become a common issue with database-driven websites. The flaw
|
||||
is easily detected, and easily exploited, and as such, any site or software
|
||||
package with even a minimal user base is likely to be subject to an attempted
|
||||
attack of this kind. This flaw depends on the fact that SQL makes no real
|
||||
distinction between the control and data planes.
|
||||
|
||||
## Links
|
||||
|
||||
|
|
|
|||
|
|
@ -1437,15 +1437,28 @@ display a color chip next to the color code. For example:
|
|||
|
||||
[View this topic in GitLab](https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/user/markdown.md#emoji).
|
||||
|
||||
Sometimes you want to <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/monkey.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":monkey:" alt=":monkey:"> around a bit and add some <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/star2.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":star2:" alt=":star2:"> to your <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/speech_balloon.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":speech_balloon:" alt=":speech_balloon:">. Well we have a gift for you:
|
||||
Sometimes you want to <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/monkey.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":monkey:" alt=":monkey:">
|
||||
around a bit and add some <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/star2.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":star2:" alt=":star2:">
|
||||
to your <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/speech_balloon.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":speech_balloon:" alt=":speech_balloon:">.
|
||||
Well we have a gift for you:
|
||||
|
||||
<img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/zap.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":zap:" alt=":zap:">You can use emoji anywhere GitLab Flavored Markdown is supported. <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/v.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":v:" alt=":v:">
|
||||
<img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/zap.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":zap:" alt=":zap:">
|
||||
You can use emoji anywhere GitLab Flavored Markdown is supported.
|
||||
<img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/v.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":v:" alt=":v:">
|
||||
|
||||
You can use it to point out a <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/bug.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":bug:" alt=":bug:"> or warn about <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/speak_no_evil.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":speak_no_evil:" alt=":speak_no_evil:"> patches. If someone improves your really <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/snail.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":snail:" alt=":snail:"> code, send them some <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/birthday.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":birthday:" alt=":birthday:">. People <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/heart.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":heart:" alt=":heart:"> you for that.
|
||||
You can use it to point out a <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/bug.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":bug:" alt=":bug:">
|
||||
or warn about <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/speak_no_evil.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":speak_no_evil:" alt=":speak_no_evil:">
|
||||
patches. If someone improves your really <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/snail.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":snail:" alt=":snail:">
|
||||
code, send them some <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/birthday.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":birthday:" alt=":birthday:">.
|
||||
People <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/heart.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":heart:" alt=":heart:">
|
||||
you for that.
|
||||
|
||||
If you're new to this, don't be <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/fearful.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":fearful:" alt=":fearful:">. You can join the emoji <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/family.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":family:" alt=":family:">. Just look up one of the supported codes.
|
||||
If you're new to this, don't be <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/fearful.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":fearful:" alt=":fearful:">.
|
||||
You can join the emoji <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/family.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":family:" alt=":family:">.
|
||||
Just look up one of the supported codes.
|
||||
|
||||
Consult the [Emoji Cheat Sheet](https://www.webfx.com/tools/emoji-cheat-sheet/) for a list of all supported emoji codes. <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/thumbsup.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":thumbsup:" alt=":thumbsup:">
|
||||
Consult the [Emoji Cheat Sheet](https://www.webfx.com/tools/emoji-cheat-sheet/) for a list
|
||||
of all supported emoji codes. <img src="https://gitlab.com/gitlab-org/gitlab-foss/raw/master/public/-/emojis/2/thumbsup.png" width="20px" height="20px" style="display:inline;margin:0;border:0;padding:0;" title=":thumbsup:" alt=":thumbsup:">
|
||||
|
||||
The above paragraphs in raw Markdown:
|
||||
|
||||
|
|
|
|||
|
|
@ -41,6 +41,7 @@ To set up infrastructure for workspaces:
|
|||
> - Support for private projects [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/124273) in GitLab 16.4.
|
||||
> - **Git reference** and **Devfile location** [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/392382) in GitLab 16.10.
|
||||
> - **Time before automatic termination** [renamed](https://gitlab.com/gitlab-org/gitlab/-/issues/392382) to **Workspace automatically terminates after** in GitLab 16.10.
|
||||
> - **Variables** [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/463514) in GitLab 17.1.
|
||||
|
||||
Prerequisites:
|
||||
|
||||
|
|
@ -66,6 +67,8 @@ To create a workspace:
|
|||
If your devfile is not in the root directory of your project, specify a relative path.
|
||||
1. In **Workspace automatically terminates after**, enter the number of hours until the workspace automatically terminates.
|
||||
This timeout is a safety measure to prevent a workspace from consuming excessive resources or running indefinitely.
|
||||
1. In **Variables**, enter the keys and values of the environment variables you want to inject into the workspace.
|
||||
To add a new variable, select **Add variable**.
|
||||
1. Select **Create workspace**.
|
||||
|
||||
The workspace might take a few minutes to start.
|
||||
|
|
|
|||
|
|
@ -33448,15 +33448,18 @@ msgstr ""
|
|||
msgid "MlModelRegistry|Drop to start upload"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Enter a description for this version of the model."
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Enter a model description"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Enter a semver version."
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Enter some description"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Enter some model description"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Experiment"
|
||||
msgstr ""
|
||||
|
||||
|
|
@ -33478,6 +33481,9 @@ msgstr ""
|
|||
msgid "MlModelRegistry|For example 1.0.0"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|For example 1.0.0. Must be a semantic version."
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|For example my-model"
|
||||
msgstr ""
|
||||
|
||||
|
|
@ -33490,9 +33496,6 @@ msgstr ""
|
|||
msgid "MlModelRegistry|Info"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Initial version name. Must be a semantic version."
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Latest version"
|
||||
msgstr ""
|
||||
|
||||
|
|
@ -33595,10 +33598,10 @@ msgstr ""
|
|||
msgid "MlModelRegistry|Using the MLflow client"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Version Description"
|
||||
msgid "MlModelRegistry|Version candidates"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Version candidates"
|
||||
msgid "MlModelRegistry|Version description"
|
||||
msgstr ""
|
||||
|
||||
msgid "MlModelRegistry|Versions"
|
||||
|
|
|
|||
Loading…
Reference in New Issue