5.2 KiB
| stage | group | info |
|---|---|---|
| Data Stores | Database | To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments |
CI mirrored tables
Problem statement
As part of the database decomposition work,
which had the goal of splitting the single database GitLab is using, into two databases: main and
ci, came the big challenge of
removing all joins between the main and the ci tables.
That is because PostgreSQL doesn't support joins between tables that belong to different databases.
However, some core application models in the main database are queried very often by the CI side.
For example:
Namespace, in thenamespacestable.Project, in theprojectstable.
Not being able to do joins on these tables brings a great challenge. The team chose to perform logical
replication of those tables from the main database to the CI database, in the new tables:
ci_namespace_mirrors, as a mirror of thenamespacestableci_project_mirrors, as a mirror of theprojectstable
This logical replication means two things:
- The
maindatabase tables can be queried and joined to thenamespacesandprojectstables. - The
cidatabase tables can be joined with theci_namespace_mirrorsandci_project_mirrorstables.
graph LR
subgraph "Main database (tables)"
A[namespaces] -->|updates| B[namespaces_sync_events]
A -->|deletes| C[loose_foreign_keys_deleted_records]
D[projects] -->|deletes| C
D -->|updates| E[projects_sync_events]
end
B --> F
C --> G
E --> H
subgraph "Sidekiq worker jobs"
F[Namespaces::ProcessSyncEventsWorker]
G[LooseForeignKeys::CleanupWorker]
H[Projects::ProcessSyncEventsWorker]
end
F -->|do update| I
G -->|delete records| I
G -->|delete records| J
H -->|do update| J
subgraph "CI database (tables)"
I[ci_namespace_mirrors]
J[ci_project_mirrors]
end
This replication was restricted only to a few attributes that are needed from each model:
- From
Namespacewe replicatetraversal_ids. - From
Projectwe replicate only thenamespace_id, which represents the group which the project belongs to.
Keeping the CI mirrored tables in sync with the source tables
We must care about two type 3 events to keep the source and the target tables in sync:
- Creation of new namespaces or projects.
- Updating the namespaces or projects.
- Deleting namespaces/projects.
graph TD
subgraph "CI database (tables)"
E[other CI tables]
F{queries with joins allowed}
G[ci_project_mirrors]
H[ci_namespace_mirrors]
E---F
F---G
F---H
end
A---B
B---C
B---D
L["⛔ ← Joins are not allowed → ⛔"]
subgraph "Main database (tables)"
A[other main tables]
B{queries with joins allowed}
C[projects]
D[namespaces]
end
Create and update
Syncing the data of newly created or updated namespaces or projects happens in this order:
- On the
maindatabase: AnyINSERTorUPDATEon thenamespacesorprojectstables adds an entry to the tablesnamespaces_sync_events, andprojects_sync_events. These tables also exist on themaindatabase. These entries are added by triggers on both of the tables. - On the model level: After a commit happens on either of the source models
NamespaceorProject, it schedules the corresponding Sidekiq jobsNamespaces::ProcessSyncEventsWorkerorProjects::ProcessSyncEventsWorkerto run. - These workers then:
- Read the entries from the tables
(namespaces/project)_sync_eventsfrom themaindatabase, to check which namespaces or projects to sync. - Copy the data for any updated records into the target
tables
ci_namespace_mirrors,ci_project_mirrors.
- Read the entries from the tables
Delete
When any of namespaces or projects are deleted, the target records on the mirrored
CI tables are deleted using the loose foreign keys (LFK) mechanism.
By having these items in the config/gitlab_loose_foreign_keys.yml, the LFK mechanism
was already working as expected. It deleted any records on the CI mirrored
tables that mapped to deleted namespaces or projects in the main database.
ci_namespace_mirrors:
- table: namespaces
column: namespace_id
on_delete: async_delete
ci_project_mirrors:
- table: projects
column: project_id
on_delete: async_delete
Consistency Checking
To make sure that both syncing mechanisms work as expected, we deploy two extra worker jobs, triggered by cron jobs every few minutes:
Database::CiNamespaceMirrorsConsistencyCheckWorkerDatabase::CiProjectMirrorsConsistencyCheckWorker
These jobs:
- Scan both of the source tables on the
maindatabase, using a cursor. - Compare the items in the
namespacesandprojectswith the target tables on thecidatabase. - Report the items that are not in sync to Kibana and Prometheus.
- Corrects any discrepancies.