Gitaly and Gitaly Cluster

Tier: Free, Premium, Ultimate Offering: Self-managed

Gitaly is present in every GitLab installation and coordinates Git repository storage and retrieval. Gitaly can be:

Gitaly implements a client-server architecture:

Gitaly manages only Git repository access for GitLab. Other types of GitLab data aren’t accessed using Gitaly.

GitLab accesses repositories through the configured repository storages. Each new repository is stored on one of the repository storages based on their configured weights. Each repository storage is either:

  • A Gitaly storage with direct access to repositories using storage paths, where each repository is stored on a single Gitaly node. All requests are routed to this node.
  • A virtual storage provided by Gitaly Cluster, where each repository can be stored on multiple Gitaly nodes for fault tolerance. In a Gitaly Cluster:
    • Read requests are distributed between multiple Gitaly nodes, which can improve performance.
    • Write requests are broadcast to repository replicas.

Before deploying Gitaly Cluster

Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Before deploying Gitaly Cluster, review:

If you have not yet migrated to Gitaly Cluster, you have two options:

  • A sharded Gitaly instance.
  • Gitaly Cluster.

Contact your or customer support if you have any questions.

Known issues

The following table outlines current known issues impacting the use of Gitaly Cluster. For the current status of these issues, refer to the referenced issues and epics.

Issue Summary How to avoid
Gitaly Cluster + Geo - Issues retrying failed syncs If Gitaly Cluster is used on a Geo secondary site, repositories that have failed to sync could continue to fail when Geo tries to resync them. Recovering from this state requires assistance from support to run manual steps. In GitLab 15.0 to 15.2, enable the gitaly_praefect_generated_replica_paths feature flag on your Geo primary site. In GitLab 15.3, the feature flag is enabled by default.
Praefect unable to insert data into the database due to migrations not being applied after an upgrade If the database is not kept up to date with completed migrations, then the Praefect node is unable to perform standard operation. Make sure the Praefect database is up and running with all migrations completed (For example: sudo -u git -- /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-migrate-status should show a list of all applied migrations). Consider so your upgrade plan can be reviewed by support.
Restoring a Gitaly Cluster node from a snapshot in a running cluster Because the Gitaly Cluster runs with consistent state, introducing a single node that is behind results in the cluster not being able to reconcile the nodes data and other nodes data Don’t restore a single Gitaly Cluster node from a backup snapshot. If you must restore from backup:

1. Shut down GitLab.
2. Snapshot all Gitaly Cluster nodes at the same time.
3. Take a database dump of the Praefect database.
Limitations when running in Kubernetes, Amazon ECS, or similar Praefect (Gitaly Cluster) is not supported and Gitaly has known limitations. For more information, see . Use our reference architectures.

Snapshot backup and recovery limitations

Gitaly Cluster does not support snapshot backups. Snapshot backups can cause issues where the Praefect database becomes out of sync with the disk storage. Because of how Praefect rebuilds the replication metadata of Gitaly disk information during a restore, you should use the official backup and restore Rake tasks.

The incremental backup method can be used to speed up Gitaly Cluster backups.

If you are unable to use either method, contact customer support for restoration help.

What to do if you are on Gitaly Cluster experiencing an issue or limitation

Contact customer support for immediate help in restoration or recovery.

Disk requirements

Gitaly and Gitaly Cluster require fast local storage to perform effectively because they are heavy I/O-based processes. Therefore, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs).

These SSDs should have a throughput of at least:

  • 8,000 input/output operations per second (IOPS) for read operations.
  • 2,000 IOPS for write operations.

These IOPS values are initial recommendations, and may be adjusted to greater or lesser values depending on the scale of your environment’s workload. If you’re running the environment on a cloud provider, refer to their documentation about how to configure IOPS correctly.

For repository data, only local storage is supported for Gitaly and Gitaly Cluster for performance and consistency reasons. Alternatives such as NFS or cloud-based file systems are not supported.

Directly accessing repositories

GitLab doesn’t advise directly accessing Gitaly repositories stored on disk with a Git client or any other tool, because Gitaly is being continuously improved and changed. These improvements may invalidate your assumptions, resulting in performance degradation, instability, and even data loss. For example:

  • Gitaly has optimizations such as the info/refs advertisement cache, that rely on Gitaly controlling and monitoring access to repositories by using the official gRPC interface.
  • Gitaly Cluster has optimizations, such as fault tolerance and distributed reads, that depend on the gRPC interface and database to determine repository state.
caution
Accessing Git repositories directly is done at your own risk and is not supported.

Gitaly

The following shows GitLab set up to use direct access to Gitaly:

GitLab application interacting with Gitaly storage shards

In this example:

  • Each repository is stored on one of three Gitaly storages: storage-1, storage-2, or storage-3.
  • Each storage is serviced by a Gitaly node.
  • The three Gitaly nodes store data on their file systems.

Gitaly architecture

The following illustrates the Gitaly client-server architecture:

flowchart LR subgraph Gitaly clients Rails[GitLab Rails] Workhorse[GitLab Workhorse] Shell[GitLab Shell] Zoekt[Zoekt Indexer] Elasticsearch[Elasticsearch Indexer] KAS["GitLab Agent for Kubernetes (KAS)"] end subgraph Gitaly GitalyServer[Gitaly server] end FS[Local filesystem] ObjectStorage[Object storage] Rails -- gRPC --> Gitaly Workhorse -- gRPC --> Gitaly Shell -- gRPC --> Gitaly Zoekt -- gRPC --> Gitaly Elasticsearch -- gRPC --> Gitaly KAS -- gRPC --> Gitaly GitalyServer --> FS GitalyServer -- TCP --> Workhorse GitalyServer -- TCP --> ObjectStorage

Configure Gitaly

Gitaly comes pre-configured with a Linux package installation, which is a configuration suitable for up to 20 RPS / 1,000 users. For:

GitLab installations for more than 2000 active users performing daily Git write operation may be best suited by using Gitaly Cluster.

Gitaly CLI

History
  • gitaly git subcommand

The gitaly command is a command-line interface that provides additional subcommands for Gitaly administrators. For example, the Gitaly CLI is used to:

For more information on the other subcommands, run sudo -u git -- /opt/gitlab/embedded/bin/gitaly --help.

Backing up repositories

When backing up or syncing repositories using tools other than GitLab, you must prevent writes while copying repository data.

Bundle URIs

You can use Git bundle URIs with Gitaly. For more information, see the Bundle URIs documentation.

Gitaly Cluster

Git storage is provided through the Gitaly service in GitLab, and is essential to the operation of GitLab. When the number of users, repositories, and activity grows, it is important to scale Gitaly appropriately by:

  • Increasing the available CPU and memory resources available to Git before resource exhaustion degrades Git, Gitaly, and GitLab application performance.
  • Increasing available storage before storage limits are reached causing write operations to fail.
  • Removing single points of failure to improve fault tolerance. Git should be considered mission critical if a service degradation would prevent you from deploying changes to production.

Gitaly can be run in a clustered configuration to:

  • Scale the Gitaly service.
  • Increase fault tolerance.

In this configuration, every Git repository can be stored on multiple Gitaly nodes in the cluster.

Using a Gitaly Cluster increases fault tolerance by:

  • Replicating write operations to warm standby Gitaly nodes.
  • Detecting Gitaly node failures.
  • Automatically routing Git requests to an available Gitaly node.
note
Technical support for Gitaly clusters is limited to GitLab Premium and Ultimate customers.

The following shows GitLab set up to access storage-1, a virtual storage provided by Gitaly Cluster:

GitLab application interacting with virtual Gitaly storage, which interacts with Gitaly physical storage

In this example:

  • Repositories are stored on a virtual storage called storage-1.
  • Three Gitaly nodes provide storage-1 access: gitaly-1, gitaly-2, and gitaly-3.
  • The three Gitaly nodes share data in three separate hashed storage locations.
  • The replication factor is 3. Three copies are maintained of each repository.

The availability objectives for Gitaly clusters assuming a single node failure are:

  • Recovery Point Objective (RPO): Less than 1 minute.

    Writes are replicated asynchronously. Any writes that have not been replicated to the newly promoted primary are lost.

    Strong consistency prevents loss in some circumstances.

  • Recovery Time Objective (RTO): Less than 10 seconds. Outages are detected by a health check run by each Praefect node every second. Failover requires ten consecutive failed health checks on each Praefect node.

Improvements to RPO and RTO are proposed in epic