Recently, I received the task of optimizing the construction and release of the front-end project. This optimization is divided into two parts, one part is the optimization of the release process of Gitlab CI/CD, and the other part is the packaging optimization of Webpack. This article will first talk about Gitlab CI/CD optimization.
First of all, let’s go over the Gitlab CI/CD implementation process and some important knowledge points of Gitlab CI/CD.
What is Gitlab CI/CD
CICD stands for Continuous Integration, Continuous Delivery, and Continuous Deployment, meaning execution through a series of automated scripts, Implement the delivery and deployment of code during development. In simple terms, after programmers upload code to a remote repository, the project can be packaged, built, tested, and deployed online automatically through pre-written scripts.
Since the company’s code is hosted on a self-built Gitlab server and Gitlab provides CI/CD for free, the company chose Gitlab CI/CD as the continuous integration and continuous delivery tool for the company’s projects.
In Gitlab CI/CD, a complete project build release process is called a Pipeline.
A Pipeline consists of several stages. The stages are executed sequentially, and the build task succeeds only when all the stages are complete. If any of the stages fails, subsequent stages do not execute and the build task fails.
Multiple jobs can be defined in a single Stage. Jobs in the same Stage are executed in parallel. The Stage succeeds only when all jobs in the same Stage are successfully executed. If any Job fails, the Stage fails, that is, the construction task fails.
Therefore, the relationship between Pipeline, Stage and Job is shown in the figure below:
Each Job goes through the following process:
Preparing the “Docker” executor: Configure the Docker execution environment, including obtaining a Docker image
Preparing Environment: Start Docker
Get the source code from Git repository
4.Restoring Cache: Reset cache if configured
5.Download artifacts: Download and unzip them, if any
6.Executing a pre-script
Executing a primary script
8.Executing a post-script
9.Saving cache: Saves a cache
10.Uploading artifacts from work
Cleaning up File Base Variables: Cleaning up file-based variables
Process analysis
Currently, four stages are defined in CI/CD of our front-end project, and there is one job in each stage:
1. Build stage, build_prod job: Build the static file and copy it to the home machine
- Install dependencies from lockfile (73.26 seconds)
- Package code to generate build folder (318.94 seconds)
- Send the build folder contents to the remote server
2. Pre-deploy, stage copy_HTML job: copy the HTML template file to the gate machine
3. Deploy stage, deploy_prod job: Triggers Web hooks of background release scripts to deploy template files to the service server
4. Post-deploy stage, ding_success job: Send a message after the task succeeds or fails
As you can see from the flow analysis above, the bulk of the time is spent on project packaging and project-dependent caching, optimizations for webpack and node_modules that will be explained in more detail in a later article.
CI/CD optimization focuses on how to improve the ability to process tasks in parallel and how to shorten the time spent on a single task.
Optimization scheme
Set Taobao Source
It takes 86.23s to install a dependency when taobao source is not set:
If set the yarn config set registry ‘https://registry.npm.taobao.org’ itself, this process takes 0.15 s
Then the installation of the same dependency takes 72.06s:
Thus, through the setting of Taobao source can indeed shorten the installation dependent time. However, in the CI/CD flow defined by our company, node_modules will be cached, and the cached node_modules will be used in preference for each CI/CD run. New dependencies will be installed only when there are updates to dependencies. For an old project that is already mature, there is very little chance that new dependencies will need to be installed, so the optimization effect of this approach is not obvious.
Parallel processing pipeline
Accidentally found that while a single development branch of the company’s front-end project was running CI/CD, the CI/CD of the other branches would go waitting. I asked my colleague who maintained the CI/CD of the company before, and he restricted the CI/CD of the front-end project by using the resource_group keyword.
When the resource_group keyword is defined for the job in the.gitlab-ci.yml file, job execution is mutually exclusive across different pipelines for the same project. If multiple jobs belonging to the same resource_group are joined at the same time, the program selects only one job. Other jobs wait until the resource_group is idle.
The previous colleague put all jobs into the same resource_group. The reason for this is that he is concerned that if a branch’s CI/CD is not running and someone pushes new code to that branch, it will have two pipelines running at the same time and will not know which pipeline build the project ran last.
This can restrict the serial execution of pipeline of a single development branch. However, when a job of a development branch is running, not only other jobs of the development branch are in waitting state, but also jobs of other development branches are in Waitting state, that is, the jobs of the whole project are executed in serial. For one development branch, the CI/CD that other development branches are running doesn’t matter at all. This approach limits Gitlab’s ability to process tasks in parallel with CI/CD.
If you simply want to restrict the sequential execution of a single development branch pipeline, add Interruptible: true to the CI/CD configuration file.
Interruptible Enables the cancellation of running jobs with InterrupTIBLE: true when a new pipe is started on the same branch.
It is now possible to run multiple pipeline packaging tasks at the same time, and old pipelines on the same branch are cancelled automatically:
Parallel processing of unrelated operations
From the previous process analysis, it can be found that in a single pipeline, there are two stages related to the back end, and the task of these two stages is to copy the front-end packaged files to different servers. Therefore, my colleague at the back end asked whether the tasks of these two stages could be put into different jobs of the same stage to be executed in parallel. The reply of my colleague at the back end was that the two stages had dependency relationship and must be executed in serial. So the optimization of parallel processing back-end related tasks is not feasible.
Merge jobs that need to be executed synchronously
GitLab runners need some time to prepare the execution environment, cache the execution results and so on when executing a single job. Since the two back-end related operations must be executed sequentially, is it possible to combine the two tasks into a single job to save some time? However, colleagues at the back end find it inconvenient to debug all operations in one job, so combining job optimization is not feasible.