126 lines
5.7 KiB
Markdown
126 lines
5.7 KiB
Markdown
---
|
|
stage: Data Access
|
|
group: Database Frameworks
|
|
info: Any user with at least the Maintainer role can merge updates to this content.
|
|
title: Large tables limitations
|
|
---
|
|
|
|
GitLab enforces some limitations on large database tables schema changes to improve manageability for both GitLab and its customers. The list of tables subject to these limitations is defined in [`rubocop/rubocop-migrations.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml).
|
|
|
|
## Table size restrictions
|
|
|
|
The following limitations apply to table schema changes on GitLab.com:
|
|
|
|
| Limitation | Maximum size after the action (including indexes and column size) |
|
|
| ------ | ------------------------------- |
|
|
| Can not add an index | 50 GB |
|
|
| Can not add a column with foreign key | 50 GB |
|
|
| Can not add a new column | 100 GB |
|
|
|
|
These limitations align with our goal to maintain [all tables under 100 GB](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/database_size_limits/) for improved [stability and performance](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/database_size_limits/#motivation-gitlabcom-stability-and-performance).
|
|
|
|
## Exceptions
|
|
|
|
Exceptions to these size limitations should only granted for the following cases:
|
|
|
|
- Migrate a table's columns from `int4` to `int8`
|
|
- Add a sharding key to support cells
|
|
- Modify a table to assist in partitioning or data retention efforts
|
|
- Replace an existing index to provide better query performance
|
|
|
|
### Requesting an exception
|
|
|
|
To request an exception to these limitations:
|
|
|
|
1. Create a new issue using the [Database Team Tasks template](https://gitlab.com/gitlab-org/database-team/team-tasks/-/issues/new?issuable_template=schema_change_exception)
|
|
1. Select the `schema_change_exception` template
|
|
1. Provide detailed justification for why your case requires an exception
|
|
1. Wait for review and approval from the Database team before proceeding
|
|
1. Link the approval issue when disabling the cop for your migration
|
|
|
|
## Techniques to reduce table size
|
|
|
|
Before requesting an exception, consider these approaches to manage table size:
|
|
|
|
### Archiving data
|
|
|
|
- Move old, infrequently accessed data to archive tables
|
|
- Implement archiving workers for automated data migration
|
|
- Consider using partitioning by date to facilitate archiving, see [date range partitioning](partitioning/date_range.md)
|
|
|
|
### Data retention
|
|
|
|
- Implement retention policies to remove old data
|
|
- Configure automated cleanup jobs for expired data, see [deleting old pipelines](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/171142)
|
|
|
|
### Table partitioning
|
|
|
|
- [Partition large tables by date](scalability/patterns/time_decay.md#time-decay-data-strategies), ID ranges, or other criteria
|
|
- Consider [range](partitioning/date_range.md) or [list](partitioning/list.md) partitioning based on access patterns
|
|
|
|
### Column optimization
|
|
|
|
- Use appropriate data types (for example, `smallint` instead of `integer` when possible)
|
|
- Remove unused or redundant indexes
|
|
- Consider using `NULL` instead of empty strings or zeros
|
|
- Use `text` instead of `varchar` to [avoid storage overhead](ordering_table_columns.md)
|
|
|
|
### Normalization
|
|
|
|
- Split large tables into related smaller tables
|
|
- Move rarely used columns to [separate tables](layout_and_access_patterns.md#data-model-trade-offs)
|
|
- Use junction tables for many-to-many relationships
|
|
- Consider vertical partitioning for [wide tables](layout_and_access_patterns.md#wide-tables)
|
|
|
|
### External storage
|
|
|
|
- Move large text or binary data to object storage
|
|
- Store only metadata in the database
|
|
- Use [Elasticsearch](../../user/search/advanced_search.md) for search-specific data
|
|
- Consider using Redis for temporary or cached data
|
|
|
|
## Alternatives to table modifications
|
|
|
|
Consider these alternatives when working with large tables:
|
|
|
|
1. Creates a separate table for new columns, especially if the column is not present in all rows. The new table references the original table through a foreign key.
|
|
1. Work with the Global Search team to add your data to Elasticsearch for enhanced filter/search functionality.
|
|
1. Simplify filtering/sorting options (for example, use `id` instead of `created_at` for sorting).
|
|
|
|
## Benefits of table size limitations
|
|
|
|
Table size limitations provide several advantages:
|
|
|
|
- Enable separate vacuum operations with different frequencies
|
|
- Generate less Write-Ahead Log (WAL) data for column updates
|
|
- Prevent unnecessary data copying during row updates
|
|
|
|
For more information about data model trade-offs, see the [database documentation](layout_and_access_patterns.md#data-model-trade-offs).
|
|
|
|
## Using `has_one` relationships
|
|
|
|
When a table becomes too large for new columns, create a new table with a `has_one` relation. For example, in [merge request !170371](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/170371), we track the total weight count of an issue in a separate table.
|
|
|
|
Benefits of this approach:
|
|
|
|
1. Keeps the main table narrower, reducing data load from PostgreSQL
|
|
1. Creates an efficient narrow table for specific queries
|
|
1. Allows selective population of the new table as needed
|
|
|
|
This approach is particularly effective when:
|
|
|
|
- The new column applies to a subset of the main table
|
|
- Only specific queries need the new data
|
|
|
|
Disadvantages
|
|
|
|
1. More tables may result in more "joins" which will complicate queries
|
|
1. Queries with multiple joins may end up being hard to optimize
|
|
|
|
## Related links
|
|
|
|
- [Database size limits](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/database_size_limits/#solutions)
|
|
- [Adding database indexes](adding_database_indexes.md)
|
|
- [Database layout and access patterns](layout_and_access_patterns.md#data-model-trade-offs)
|
|
- [Data retention guidelines for feature development](../data_retention_policies.md)
|