gitlab-ce/doc/administration/self_hosted_models/install_infrastructure.md

---
stage: AI-Powered
group: Custom Models
description: Set up your self-hosted model infrastructure
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
---

# Set up your self-hosted model infrastructure

DETAILS:
**Tier:** For a limited time, Premium and Ultimate. In the future, [GitLab Duo Enterprise](../../subscriptions/subscription-add-ons.md).
**Offering:** Self-managed
**Status:** Beta

> - [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/12972) in GitLab 17.1 [with a flag](../../administration/feature_flags.md) named `ai_custom_model`. Disabled by default.

FLAG:
The availability of this feature is controlled by a feature flag.
For more information, see the history.

By self-hosting the model, AI Gateway, and GitLab instance, there are no calls to
external architecture, ensuring maximum levels of security.

To set up your self-hosted model infrastructure:

1. Install the large language model (LLM) serving infrastructure.
1. Configure your GitLab instance.
1. Install the GitLab AI Gateway.

<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an installation video guide, see [Self-Hosted Models Deployment](https://youtu.be/UNmD9-sgUvw).
<!-- Video published on 2024-05-30 -->

<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an installation video guide in French, see [Self-Hosted Models Deployment (French Language version)](https://youtu.be/UNmD9-sgUvw).
<!-- Video published on 2024-05-30 -->

## Install large language model serving infrastructure

Install one of the following GitLab-approved LLM models:

| Model                                                                              | Code completion | Code generation | GitLab Duo Chat |
|------------------------------------------------------------------------------------|-----------------|-----------------|---------|
| [CodeGemma 2b](https://huggingface.co/google/codegemma-2b)                         | **{check-circle}** Yes               | **{dotted-circle}** No               | **{dotted-circle}** No        |
| [CodeGemma 7b-it](https://huggingface.co/google/codegemma-7b-it) (Instruction)     | **{dotted-circle}** No                | **{check-circle}** Yes               | **{dotted-circle}** No        |
| [CodeGemma 7b-code](https://huggingface.co/google/codegemma-7b) (Code)             | **{check-circle}** Yes               | **{dotted-circle}** No               | **{dotted-circle}** No        |
| [Code-Llama 13b-code](https://huggingface.co/meta-llama/CodeLlama-13b-hf)          | **{check-circle}** Yes               | **{dotted-circle}** No               | **{dotted-circle}** No        |
| [Code-Llama 13b](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf)      | **{dotted-circle}** No                | **{check-circle}** Yes               | **{dotted-circle}** No        |
| [Codestral 22B](https://huggingface.co/mistralai/Codestral-22B-v0.1) (see [setup instructions](litellm_proxy_setup.md#example-setup-for-codestral-with-ollama))                                         | **{check-circle}** Yes               | **{check-circle}** Yes               | **{dotted-circle}** No        |
| [Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1)                     | **{dotted-circle}** No                | **{check-circle}** Yes               | **{check-circle}** Yes        |
| [Mixtral 8x22B](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)       | **{dotted-circle}** No                | **{check-circle}** Yes               | **{check-circle}** Yes        |
| [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)        | **{dotted-circle}** No                | **{check-circle}** Yes               | **{check-circle}** Yes        |

### Use a serving architecture

You should use one of the following serving architectures with your
installed LLM:

- [vLLM](https://docs.vllm.ai/en/stable/)
- [TensorRT-LLM](https://docs.mistral.ai/deployment/self-deployment/overview/)
- [Ollama and litellm](litellm_proxy_setup.md)

#### Litellm config examples for quickly getting started with Ollama

```yaml
model_list:
  - model_name: mistral
    litellm_params:
      model: ollama/mistral:latest
      api_base: YOUR_HOSTING_SERVER
  - model_name: mixtral
    litellm_params:
      model: ollama/mixtral:latest
      api_base: YOUR_HOSTING_SERVER
  - model_name: codegemma
    litellm_params:
      model: ollama/codegemma
      api_base: YOUR_HOSTING_SERVER
  - model_name: codestral
    litellm_params:
      model: ollama/codestral
      api_base: YOUR_HOSTING_SERVER
  - model_name: codellama
    litellm_params:
      model: ollama/codellama:13b
      api_base: YOUR_HOSTING_SERVER
  - model_name: codellama_13b_code
    litellm_params:
      model: ollama/codellama:code
      api_base: YOUR_HOSTING_SERVER
  - model_name: deepseekcoder
    litellm_params:
      model: ollama/deepseekcoder
      api_base: YOUR_HOSTING_SERVER
  - model_name: mixtral_8x22b
    litellm_params:
      model: ollama/mixtral:8x22b
      api_base: YOUR_HOSTING_SERVER
  - model_name: codegemma_2b
    litellm_params:
      model: ollama/codegemma:2b
      api_base: YOUR_HOSTING_SERVER
  - model_name: codegemma_7b
    litellm_params:
      model: ollama/codegemma:code
      api_base: YOUR_HOSTING_SERVER
```

## Configure your GitLab instance

1. For the GitLab instance to know where the AI Gateway is located so it can access
   the gateway, set the environment variable `AI_GATEWAY_URL` inside your GitLab
   instance environment variables:

   ```shell
   AI_GATEWAY_URL=https://<your_ai_gitlab_domain>
   ```

1. Where your GitLab instance is installed, [run the following Rake task](../../raketasks/index.md)
   to activate GitLab Duo features:

   ```shell
   sudo gitlab-rake gitlab:duo:enable_feature_flags
   ```

1. [Start a GitLab Rails console](../feature_flags.md#start-the-gitlab-rails-console):

   ```shell
   sudo gitlab-rails console
   ```

   In the console, enable the `ai_custom_model` feature flag:

   ```shell
   Feature.enable(:ai_custom_model)
   ```

   Exit the Rails console.

## Install the GitLab AI Gateway

### Install by using Docker

Prerequisites:

- Install a Docker container engine, such as [Docker](https://docs.docker.com/engine/install/#server).
- Use a valid hostname accessible within your network. Do not use `localhost`.

The GitLab AI Gateway Docker image contains all necessary code and dependencies
in a single container.

#### Find the AI Gateway release

Find the GitLab official Docker image at:

- [AI Gateway Docker image on Container Registry](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/container_registry/).
- [AI Gateway Docker image on DockerHub](https://hub.docker.com/repository/docker/gitlab/model-gateway/tags).
- [Release process for self-hosted AI Gateway](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/docs/release.md).

Use the image tag that corresponds to your GitLab version. For example, if the
GitLab version is `v17.4.0`, use `self-hosted-v17.4.0-ee` tag.

For version `v17.3.0-ee`, use image tag `gitlab-v17.3.0`.

WARNING:
Docker for Windows is not officially supported. There are known issues with volume
permissions, and potentially other unknown issues. If you are trying to run on Docker
for Windows, see the [getting help page](https://about.gitlab.com/get-help/) for links
to community resources (such as IRC or forums) to seek help from other users.

#### Set up the volumes location

Create a directory where the logs will reside on the Docker host. It can be under
your user's home directory (for example `~/gitlab-agw`), or in a directory like
`/srv/gitlab-agw`. To create that directory, run:

```shell
sudo mkdir -p /srv/gitlab-agw
```

If you're running Docker with a user other than `root`, ensure appropriate
permissions have been granted to that directory.

#### Optional: Download documentation index

NOTE:
This only applies to AI Gateway image tag `gitlab-17.3.0-ee` and earlier. For images with tag `self-hosted-v17.4.0-ee` and later, documentation search is embedded into the Docker image.

To improve results when asking GitLab Duo Chat questions about GitLab, you can
index GitLab documentation and provide it as a file to the AI Gateway.

To index the documentation in your local installation, run:

```shell
pip install requests langchain langchain_text_splitters
python3 scripts/custom_models/create_index.py -o <path_to_created_index/docs.db>
```

This creates a file `docs.db` at the specified path.

You can also create an index for a specific GitLab version:

```shell
python3 scripts/custom_models/create_index.py --version_tag="{gitlab-version}"
```

#### Start a container from the image

For Docker images with version `self-hosted-17.4.0-ee` and later, run the following:

```shell
docker run -e AIGW_GITLAB_URL=<your_gitlab_instance> <image>
```

For Docker images with version `gitlab-17.3.0-ee` and `gitlab-17.2.0`, run:

```shell
docker run -e AIGW_CUSTOM_MODELS__ENABLED=true \
   -v path/to/created/index/docs.db:/app/tmp/docs.db \
   -e AIGW_FASTAPI__OPENAPI_URL="/openapi.json" \
   -e AIGW_AUTH__BYPASS_EXTERNAL=true \
   -e AIGW_FASTAPI__DOCS_URL="/docs"\
   -e AIGW_FASTAPI__API_PORT=5052 \
   <image>
```

The arguments `AIGW_FASTAPI__OPENAPI_URL` and `AIGW_FASTAPI__DOCS_URL` are not
mandatory, but are useful for debugging. From the host, accessing `http://localhost:5052/docs`
should open the AI Gateway API documentation.

### Install by using Docker Engine

1. For the AI Gateway to access the API, it must know where the GitLab instance
   is located. To do this, set the environment variables `AIGW_GITLAB_URL` and
   `AIGW_GITLAB_API_URL`:

   ```shell
   AIGW_GITLAB_URL=https://<your_gitlab_domain>
   AIGW_GITLAB_API_URL=https://<your_gitlab_domain>/api/v4/
   ```

1. [Configure the GitLab instance](#configure-your-gitlab-instance).

1. After you've set up the environment variables, [run the image](#start-a-container-from-the-image).

1. Track the initialization process:

   ```shell
   sudo docker logs -f gitlab-aigw
   ```

After starting the container, visit `gitlab-aigw.example.com`. It might take
a while before the Docker container starts to respond to queries.

### Install by using the AI Gateway Helm chart

Prerequisites:

- You must have a:
  - Domain you own, that you can add a DNS record to.
  - Kubernetes cluster.
  - Working installation of `kubectl`.
  - Working installation of Helm, version v3.11.0 or later.

For more information, see [Test the GitLab chart on GKE or EKS](https://docs.gitlab.com/charts/quickstart/index.html).

#### Add the AI Gateway Helm repository

Add the AI Gateway Helm repository to Helm’s configuration:

```shell
helm repo add ai-gateway \
https://gitlab.com/api/v4/projects/gitlab-org%2fcharts%2fai-gateway-helm-chart/packages/helm/devel
```

#### Install the AI Gateway

1. Create the `ai-gateway` namespace:

   ```shell
   kubectl create namespace ai-gateway
   ```

1. Generate the certificate for the domain where you plan to expose the AI Gateway.
1. Create the TLS secret in the previously created namespace:

   ```shell
   kubectl -n ai-gateway create secret tls ai-gateway-tls --cert="<path_to_cert>" --key="<path_to_cert_key>"
   ```

1. For the AI Gateway to access the API, it must know where the GitLab instance
   is located. To do this, set the `gitlab.url` and `gitlab.apiUrl` together with
   the `ingress.hosts` and `ingress.tls` values as follows:

   ```shell
   helm repo add ai-gateway \
     https://gitlab.com/api/v4/projects/gitlab-org%2fcharts%2fai-gateway-helm-chart/packages/helm/devel
   helm repo update

   helm upgrade --install ai-gateway \
     ai-gateway/ai-gateway \
     --version 0.1.1 \
     --namespace=ai-gateway \
     --set="gitlab.url=https://<your_gitlab_domain>" \
     --set="gitlab.apiUrl=https://<your_gitlab_domain>/api/v4/" \
     --set "ingress.enabled=true" \
     --set "ingress.hosts[0].host=<your_gateway_domain>" \
     --set "ingress.hosts[0].paths[0].path=/" \
     --set "ingress.hosts[0].paths[0].pathType=ImplementationSpecific" \
     --set "ingress.tls[0].secretName=ai-gateway-tls" \
     --set "ingress.tls[0].hosts[0]=<your_gateway_domain>" \
     --set="ingress.className=nginx" \
     --timeout=300s --wait --wait-for-jobs
   ```

This step can take will take a few seconds in order for all resources to be allocated
and the AI Gateway to start.

Wait for your pods to get up and running:

```shell
kubectl wait pod \
  --all \
  --for=condition=Ready \
  --namespace=ai-gateway \
  --timeout=300s
```

When your pods are up and running, you can set up your IP ingresses and DNS records.

#### Configure the GitLab instance

[Configure the GitLab instance](#configure-your-gitlab-instance).

With those steps completed, your Helm chart installation is complete.

## Upgrade the AI Gateway Docker image

To upgrade the AI Gateway, download the newest Docker image tag.

1. Stop the running container:

   ```shell
   sudo docker stop gitlab-aigw
   ```

1. Remove the existing container:

   ```shell
   sudo docker rm gitlab-aigw
   ```

1. Pull and [run the new image](#start-a-container-from-the-image).

1. Ensure that the environment variables are all set correctly.

## Alternative installation methods

For information on alternative ways to install the AI Gateway, see
[issue 463773](https://gitlab.com/gitlab-org/gitlab/-/issues/463773).

## Troubleshooting

First, run the [debugging scripts](troubleshooting.md#use-debugging-scripts) to
verify your self-hosted model setup.

For more information on other actions to take, see the
[troubleshooting documentation](troubleshooting.md).