Gitee: Architecture evolution under High concurrency and large storage

Since the launch of Gitee in 2013, the amount of data has doubled every year. By March 2021, Gitee has more than 6 million developers and more than 15 million warehouses, making it the leading r&d collaboration platform in China. As data grew, Gitee’s architecture took several iterations to support the current level of data. I have shared the architecture of Gitee in many conferences and discussed it with many students with similar scenes. I was occasionally asked if there was a special article to introduce the architecture of Gitee. Therefore, I rarely had time to sort out this topic for your reference during the holiday.

As the fastest growing code hosting platform in China, Gitee’s data is growing rapidly every day, and with the popularization of DevOps concept, continuous construction also brings more requests and greater concurrency to the platform, and tens of millions of Git operations need to be handled every day. Gitee architecture is also gradually developed iteratively in this process. Looking back at the development of Gitee architecture, it can be divided into five stages:

Single machine architecture
Distributed Storage Architecture
The NFS architecture
Self-developed sharding architecture
Rime read-write separation architecture

Let’s share the evolution of Gitee’s entire architecture.

Single machine architecture

Gitee was launched in May 2013 as a single Rails application through which all requests were loaded.

In addition to deploying Mysql and Redis on a separate machine, unlike most Web applications, Gitee needs to store a large number of Git repositories. Whether it is Web reading and writing of the repository or Git operation of the repository, the application needs to directly operate the bare repository on the server. This single architecture is ok when the traffic is not large, such as the team or enterprise internal use, but if it is provided as a public cloud SaaS service, with the increase of traffic and usage, the pressure will become more and more obvious, mainly the following two:

Pressure on storage space
Pressure on computing resources

Thanks to the influence of the open source Chinese community, Gitee has been flooded with users since its launch, with no need to worry about the source of seed users. Instead, with the use of community of users more and more, the first problem is the pressure of the storage, is due to the use of ali cloud cloud hosting, one of the largest disk can only choose 2 t, although later through some channels to achieve the expansion, but cloud host after the physical machine is only a 1 u, there can be at most four piece of hard disk, So when the storage reaches close to 8 TERabytes, there is no better way to directly expand the capacity than to attach external storage devices.

In addition, with the increase of usage, every working day peak time, such as around 9 o ‘clock in the morning, 5 o ‘clock in the afternoon, is the peak time of push and pull code, the IO of the machine is almost full load, so the whole system will be very slow at this time, so the system expansion is urgent. After discussion, the team decided to use the distributed storage system Ceph. After a series of not particularly rigorous “verification” (which was also the root cause of the later problems), we purchased machines and began to expand the system.

Distributed Storage Architecture

Ceph is a distributed file system. Its main goal is to design a posiX-based distributed file system without single point of failure, which can be easily scaled up to several PB levels of capacity. Therefore, the idea at that time was to realize the capacity expansion of storage system by taking advantage of the horizontal capacity expansion and high reliability of Ceph. In addition, multiple groups of stateless applications are provided on the upper layer of the storage system so that these applications share Ceph storage, further realizing the purpose of computing resource expansion.

Therefore, in July 2014, we purchased a batch of machines and began to build and verify the system. Then, we selected a weekend to start the migration and online of the system. After the migration, the function verification was normal, but on weekdays, with the increase of traffic, everything began to develop in a bad direction, and the whole system began to become very slow. After investigation, it was found that the bottleneck of the system was in the IO of Ceph, so an ISCSI storage device was urgently called. Migrate data for stress sharing. The Ceph RBD device was suddenly uninstalled, all the warehouse data was gone, and suddenly the entire cluster and community exploded. After 14 hours of analysis and research, the device was remounted, and the data was moved to the ISCSI storage device at full speed. To gradually calm the storm.

Read/write performance bottlenecks of massive small files
The RBD block device is uninstalled unexpectedly

Later, after research, I found that distributed storage system is not suitable for the scenario of massive small files like Git, because every operation of Git needs to traverse a large number of references and objects, resulting in the overall time consuming of each operation. Github previously published a blog, It was also mentioned that distributed storage systems are not suitable for Git scenarios. And it took us up to 14 hours to recover the block device when it was uninstalled, which was a consequence of using the tool without a thorough understanding of it. After the lessons of blood and tears, we were more cautious and careful to make all subsequent adjustments.

The NFS architecture

However, the storage pressure and computational pressure is still there, is an urgent need to solve the problem, how to do? Therefore, in order to solve the problem temporarily, we adopted the relatively original plan, that is, the plan officially provided by Gitlab in 2014

This scheme mainly uses NFS to share disks and sets up several upstream application instances to realize the expansion of computing resources. However, as the storage is networked, it will inevitably bring performance loss. Moreover, in the actual application process, Git operation scenarios are complicated, which will bring a series of problems

Intranet Bandwidth Bottleneck
NFS performance problems cause an avalanche effect
The NFS buffer file is not completely deleted. Procedure
There is no easy horizontal expansion of storage, no maintenance

Intranet Bandwidth Bottleneck

Because the storage is mounted through NFS, if there is a large warehouse, for example, more than 1G, it will consume a lot of Intranet bandwidth during Clone execution. Generally, the network port of our server is 1Gbps, so it is easy to fill up the network card. The result is a situation where other warehouses are slowed down and a lot of requests are blocked. This is not the most serious situation. The most serious situation is that the internal service network ports are fully occupied, resulting in serious packet loss of Mysql, Redis and other services, and the whole system will slow down. At that time, the solution was to transfer the core services to other network ports, but the PROBLEM of NFS network ports still cannot be solved.

NFS performance problems cause an avalanche effect

If the I/O performance of a certain NFS storage machine is too slow and all application machines have read/write requests from this storage machine, the whole system will be in trouble. Therefore, the system under this architecture is very vulnerable to test.

The NFS buffer file is not completely deleted. Procedure

This problem is a headache. NFS memory caching is enabled to improve file read and write performance. As a result, some files on NFS storage are deleted on some machines, but still exist in the memory on other machines, resulting in some application logic errors.

For example, Git generates a.lock file in the push process to prevent problems caused by other clients pushing code at the same time. Therefore, if we push code to the master branch, the server will generate a master.lock file. There is no way for other clients to push code to the master branch at the same time. Git will automatically delete the master.lock file after pushing the code. However, due to the above reasons, there are some cases where the master.lock file is deleted after processing the push request on one application machine, but it still exists in the memory of another application machine. The solution to this problem is to turn off NFS memory level caching, but it is a difficult choice because performance suffers. Fortunately, this problem is rare, so you have to live with it for performance.

Poor maintenance

As always, due to historical reasons, the application storage directory structure was fixed, so we had to expand the entire directory through soft connection, and the premise of expansion was to mount NFS storage devices on the directory, so the mounting situation of each application machine in the entire system was very complicated

git@gitee-app1:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 184G 15G 160G 9% / /dev/sda2 307G 47G 245G 16% /home 172.16.3.66:/data 10T 50G 9.9T 1% /data 172.16.30.1:/disk1 10T 50G 9.9T 1% /disk1 172.16.30.2:/disk2 10T 50G 9.9T 1% /disk2 172.16.30.3:/disk3 10T 50G 9.9T 1% /disk3 172.16.30.4:/ Disk4 10T 50G 9.9T 1% /disk4 172.16.30.5:/disk5 10T 50G 9.9T 1% /disk5 172.16.30.6:/disk6 10T 50G 9.9T 1% /disk6 172.16.30.7:/ Disk7 10T 50G 9.9t 1% /disk7 10T 50G 9.9t 1% /disk7...Copy the code

Wow, look at this directory structure, operation and maintenance will cry, extremely difficult to maintain, this way, out of control sooner or later.

Self-developed sharding architecture

NFS can withstand this approach for a while, but it is not sustainable, so you have to look for changes and architectural improvements. The ideal approach, of course, is Github’s sharding architecture, where applications and repository calls are separated by RPC, making it easier to extend and maintain

But what if this transformation required a transformation of the application, which was expensive, took a long time, and given the situation at the time, there were hardly any r&d resources devoted to the architecture? When we were doing this architecture discussion, one of our front office colleagues (nickname: a giant panda) came up with the idea that since apps can’t be disassembled, why not do sharding routing on the web layer?

Digression: inside the team is necessary to “question”, and inspire the atmosphere of team discussion, we can better do something valuable, so each team member, especially as a developer, don’t be afraid to say forever, you a little idea, for the team is likely to be a very long-term impact. For example, Mr. Panda’s words directly determine the future development direction of Gitee architecture. I hope we can eat bamboo together when we are free. D

Therefore, the first version of the architecture came into being, we did not change the original structure of the application, and allowed the application to be stateful, that is, the application is tied to the warehouse, a group of applications for a batch of warehouses, as long as the request can be identified and distributed to the corresponding application for processing.

From a business perspective, requests on Gitee fall into three categories:

HTTP (s) requests, browse repositories, and Git’s CODE for HTTP (s) operations
SSH request, Git SSH operation code
SVN request, Gitee feature, use SVN to operate Git repository

Therefore, we only need to fragment the three types of requests, intercept the warehouse information from the request, find the corresponding machine according to the warehouse information, and then forward the request. As a result, we developed three components that act as routing proxies for these three requests

Miracle HTTP (S) Dynamic Distribution proxy

Component based on Nginx secondary development, the main function is to intercept the URL, to obtain the namespace of the warehouse, and then according to the namespace Proxy. Such as in the figure above, we request the https://gitee.com/zoker/taskover warehouse Miracle or learned this request is via the URL request zoker warehouse, so the Miracle will first way by Redis for the User. The zoker routing, If it does not exist, go to the database for search, and cache in Redis to improve the speed of obtaining routing IP address. After obtaining the IP address, Miracle dynamically sends the request to the backend App1 corresponding to the Proxy, and the user will see the contents of the warehouse correctly.

The distribution of routes must be accurate. If user.zoker gets a wrong IP address, then the User will see an empty warehouse, which is not what we expect. In addition, non-warehouse requests, that is, requests that are not related to warehouse resources, such as login, dynamic, etc., will be randomly distributed to any backend machine, which can handle them because they are not related to the warehouse.

Ssh-svn dynamic distribution agent

The SSHD component is mainly used to distribute SSH requests to Git. LibSSH is used for binary splitting. SVNSBZ is a dynamic distribution agent for SVN requests. The logic of both implementations is similar to Miracle and will not be described here.

legacy

After the launch of this architecture, both in terms of architecture load and in terms of operation and maintenance cost, there has been a great improvement. However, the evolution of architecture is always endless, there is no panacea, and the current architecture still has some problems:

The user atomic unit shard is too large
Git via HTTPS requests are handled by GiteeWeb and affect each other
Git via SSH and SVN apis are still handled by GiteeWeb
The problem of excessive single bin load is not solved

Is because users or groups of atoms shard, so if a user under the warehouse too much, the volume is too big, can finish a machine can’t handle, although we have restrictions on applications for a single user can create warehouse number and volume, but this scenario is bound to appear, so you need to consider in advance. And if a single repository gets too many visits, such as for some popular open source projects, a single machine may not be able to handle the requests in extreme cases and still be flawed.

In addition, Git requests involve authentication, all authentication is still through the GiteeWeb interface, and Git HTTPS operations are still handled by GiteeWeb, unlike SSH, there is no separate component for processing, so the coupling is still too strong.

Based on the above problems, we further improved the architecture, mainly making the following changes:

Shard with warehouse as atomic unit
Git via HTTPS service is removed
Remove apis related to SSH and SVN operations

Warehouse sharding makes the atomic unit of routes smaller, making it easier to manage and expand routes. Warehouse routes are based on the space/warehouse address, similar to zoker/ Taskover keys

The main purpose of decoupling Git’s HTTP (S) operations is to prevent them from impacting Web access, because Git operations are time-consuming, different scenarios, and can easily be affected together. The independence of the authentication Api is also intended to reduce the pressure on GiteeWeb, because push and pull operations are very, very numerous, so the Api access is very large, and it is very easy to interact with it mixed with regular user Web requests.

After these unlocks, GiteeWeb is much more stable, with instability due to Api and Git operations dropping by about 95%. The composition of the architectural components looks something like this

legacy

Although the overall stability of the system is improved, we still need to consider some extreme situations, such as what if a single warehouse is too large? How to do the single warehouse traffic is too large? Fortunately, the system can limit the capacity of a single warehouse, but what about a very hot, hot warehouse? How do you adapt to sudden, massively concurrent access?

Rime read-write separation architecture

Gitee as the nation’s largest r&d collaboration platform, but also as a leading code hosting platform, many open source projects established in Gitee ecology, including the heat very high warehouse, and in colleges and universities, training institutions, such as hackers marathon scenario is as code hosting platform of choice, often encounter big concurrent access. However, the main problem of the current architecture is that the backup of the machine is cold standby, which cannot be effectively used, and the problem of excessive load of single warehouse requests has not been solved.

Why Rime architecture?

Since Huawei came to Gitee, we’ve really taken this issue seriously. Since 2020, Huawei has successively opened source frameworks such as MindSpore and openEuler on Gitee platform, and the pressure of single warehouse has gradually emerged. In order to meet the open source of Hongmeng operating system, which attracted worldwide attention in September 2020, we have continued to optimize our architecture in the first half of 2020. It enables multiple machines to load IO operations in the same warehouse, which is our current Rime read-write separation architecture.

Realize the principle of

In order to achieve the multi-read effect of the machine, we must consider the synchronization consistency of the warehouse. Imagine, if a request is sent to a standby machine, and the host just pushed the code, then the repository that the user sees on the web page will be the one before the push, which is a very serious problem. Then how to ensure that the user accesses the standby machine is also the latest code? Or how to ensure the timeliness of synchronization? Here we use the following logic to ensure

Write Operation Write to the host
Synchronization is initiated from the host to the standby host
Proactively maintain synchronization status and determine route distribution based on synchronization status

As shown in the figure above, we divide the warehouse operations into read and write operations. Generally, reads are equally distributed to each standby machine, so if we have one host and two standby machines, the warehouse’s read capacity is theoretically increased by three times, regardless of other factors. However, considering that there will be writes in the repository, it will involve synchronization on the standby machine, and as we mentioned earlier, if synchronization is not timely, it will result in access to the old code, which is obviously a huge disadvantage.

To solve this problem, we use Git hooks to synchronously trigger a synchronized queue after the repository is written. This queue has the following tasks:

Synchronize the warehouse to the standby server
Verify the consistency of the synchronized repository
Manage change synchronization status

When a repository has a push, a Git hook will trigger a synchronization task, which will actively synchronize the delta to the configured standby machine. After the synchronization is complete, a consistency check will be performed for the reference. This consistency check uses blake3 hash algorithm. To verify that the versions of the repository are identical after synchronization.

For state management, when trigger tasks, will be the first between the two, the standby state is set to the warehouse is not synchronized, we distribute components for read operations, will only be distributed to the host or set to have synchronous state machine, when the synchronization is complete and the complete consistency check, will be related synchronization state is set to have synchronous machine, Then the read operation will be dispatched to the standby machine again. However, if the synchronization fails, as shown in the figure above, and the synchronization to App1bakA is successful, then the read operation can be normally distributed to the standby machine, but App1bakB fails, then the read operation will not be distributed to the unsynchronized machine, avoiding the problem of inconsistent access.

Architectural achievement

By changing the architecture to read and write separation, the system can easily cope with the situation that a single warehouse access is too large. On September 10, 2020, Huawei Hongmeng operating system was officially open source on Gitee. Once this highly anticipated project was opened, it brought huge traffic and a large number of warehouse download operations to Gitee. Due to the full preparation of the preliminary work and the read-write separation architecture, the performance of single warehouse load was greatly improved. So it is perfect for the success of hongmeng operating system escort.

The follow-up to optimize

Some of you may have already thought that if a warehouse is written differently and accompanied by a large number of visits, is it a single machine that has to handle all the requests? The answer is Yes, but this scenario is not under normal circumstances, normally write operation frequency is much lower than the read operation, if it happens, just means being attacked, so we also conducted a single warehouse on the component of maximum concurrent restrictions, it is concluded that the reasonable restrictions since we maintain Gitee, It will not affect the use of normal users.

Structure optimization is endless, however, in the case of mentioned above, we are still need to be improved, the main approach, mainly is the time to submit an update, the standby synchronization success or partial synchronization in order to be successful in the push for machine successfully, the disadvantage is that this way can lengthen the time to push, but can solve the problem of single host read very well. The current architecture is multi-read and single-write. If there are some frequent write scenarios in the following field, you can consider changing to multi-read and multi-write, and do a good job in state and conflict maintenance.

future

The biggest problem of the current architecture is that the operation of the application and warehouse is not separated, which is very bad for the expansibility of the architecture. Therefore, what we are doing now or in the future is to separate the service and optimize other aspects:

The operations of the warehouse are detached and called separately in RPC mode
Separation of the front and back ends of the application
Unpacking of services such as queues, notifications, etc
Automatic on-demand expansion of hotspot warehouses
Allocate new warehouse according to machine specifications
.

The last

Since the launch of Gitee in 2013, it was not until the launch of the self-developed architecture in 2017 that Gitee truly solved the internal problems and external problems. The “internal” is due to the instability caused by the inability of the architecture to support the traffic volume, and the “external” is due to the difficulty in coping with some external DDOS and CC attacks. Fortunately, the internal function of the architecture is properly improved. These have been problems can be easily dealt with.

There is an old saying well said, out of all the scenarios about the behavior of the technology are all play rascal, as is architecture, from the background to talk about architecture is meaningless, a lot of the time we have done a lot of work, may only be able to solve the problems of the current or the next few years, but we need vision, for subsequent product development, the growth of the data, functional enhancements to do forecast, In this way, the architecture can be better adapted to adapt to this rapidly evolving field, and thus better serve enterprises and empower developers.

The original article was written by Zhou Kai, head of Gitee, and published on the public account “Zoker Essays”.

Gitee: Architecture evolution under High concurrency and large storage

Single machine architecture

Distributed Storage Architecture

The NFS architecture

Intranet Bandwidth Bottleneck

NFS performance problems cause an avalanche effect

The NFS buffer file is not completely deleted. Procedure

Poor maintenance

Self-developed sharding architecture

Miracle HTTP (S) Dynamic Distribution proxy

Ssh-svn dynamic distribution agent

legacy

legacy

Rime read-write separation architecture

Why Rime architecture?

Realize the principle of

Architectural achievement

The follow-up to optimize

future

The last

Related Posts

Super comprehensive Golang practice experience sharing!

Full of dry goods! Recommend some very useful Mac software (bullet 3)

Query optimization practice for MaxCompute complex data distribution