Recently, when dividing projects according to business, our group has been divided into many projects, most of which are based on Node.js, which is also the language our group continues to use.
Some problems with existing processes
When maintaining multiple projects, there are some problems:
- How to use test cases effectively
- How to use it effectively
ESLint
- Could the deployment go online any faster
- Using the
TypeScript
Additional costs later on
- Using the
The test case
The first is the test cases, which we originally designed inside git hooks to check before committing to Git. This creates a timing problem, which is fine for daily development, but for online bug fixes, the time to execute test cases can last several minutes depending on the size of the project. In order to fix bugs, it is possible to add -n to the commit to skip the hooks. This is fine when fixing bugs, but even if you use a commit -n to skip the tests, there is no way to control it. After all, this check is done in the local, whether to follow this rule, depends on everyone consciously.
So it turns out after a while that executing test cases in this way to avoid some of the risks may not be very effective.
ESLint
Then there’s ESLint. Our team customizes a more team friendly set of rules based on Airbnb’s ESLint rules. We introduce plugins in the editor to help highlight errors and do some automatic formatting. Git hooks are checked during git commit. If they do not conform to the specification, they are not allowed to commit. But this is the same problem as the test case:
- Whether the editor is installed
ESLint
There is no way to know if the plug-in is installed or if the error message is ignored. git hooks
Can be bypassed
Deployment mode
The previous team deployment went live using the SHIPit peripheral suite. The deployment environment is highly dependent on the local server, because you need to create a temporary directory of the repository on the local server, and perform the deployment and online operation by using SSH XXX “command” for several times. Shipit provides an effective rollback solution by adding multiple historical deployment versions to the post-deployment path and pointing the currently running project directory to one of the previous versions when rolling back. The downside is that it’s hard to choose which node I want to roll back to, and it takes up extra disk space to keep the history, but because of this, shipit has some uncomfortable places to deploy multiple servers.
If you have multiple new servers, you can batch deploy them by passing in multiple target server addresses in the SHIPit configuration file. But suppose some small traffic needs to come online on a certain day (say, one of the four machines), and because of the shipit rollback strategy mentioned earlier, the historical version timestamp of the single machine is inconsistent with that of the other three machines (because they didn’t come online at the same time). The timestamps are generated based on local time, perform the on-line operation of the machine, before have met your colleagues in your local test code, adjust time to time a few days ago, haven’t changed back after the time when the right time for a deployment operations, code problems found after the rollback failed, the reason is that the colleagues deployed version timestamp is too small, Shipit could not find the previous version (shipit can set the number of historical versions to keep, and the earliest timestamp was larger than the one in question)
In other words, if you do a small rollout once, you won’t be able to use the batch rollout functionality in the future.
Based on the above, our deployment online time becomes: (number of machines) X (total time of warehouse clone and multiple SSH operations based on local network speed). P.S. To ensure the validity of the warehouse, each time a SHIPit deployment is performed, it deletes the previous copy and clones it again
Especially for server-side projects, sometimes urgent bug fixes may occur outside of business hours, which means your network environment may not be very stable at the time. I once received a wechat message from a colleague in the evening, asking me to help him launch the project. His Wi-Fi was that of a Doctor, and there were some problems when downloading the project. There is also the use of mobile devices open hot spot mode online operation, there is a non-before and after the separation of the project online, directly received unicom SMS: “your traffic this month has exceeded XXX” (at that time, still with the contract package, 800M traffic in January).
TypeScript
Since the second half of last year, our team has been pushing TypeScript because it is easier to maintain TypeScript with a clear type on a large project. As you all know, TypeScript ultimately needs to be compiled and converted to JavaScript (there are also TSC versions that don’t generate JS files, but this one is more likely to be used for local development, and we want as few variables as possible for online code to run).
Therefore, the previous on-line process requires an additional step to compile TS. And because shipit is locally cloned and deployed in the repository, this means that we have to put the generated JS files into the repository as well, which, at its most intuitive, looks ugly (50% TS, 50% JS) from the repository view, which further increases the cost of going online.
In summary, the existing process of deploying online is too dependent on the local environment because everyone’s environment is different, which adds a lot of uncontrollable factors to the deployment process.
How to solve these problems
Some of the problems we encountered above can be divided into two parts:
- Valid constraints on code quality
- Rapid deployment goes online
So we started looking for a solution, because our source code was managed using our own GitLab repository, and we found GitLab CI/CD first. After studying the documentation, it was found to be a good solution to the problems we are now facing.
Using GitLab CI/CD is as simple as installing Gitlab-Runner on an extra server and registering projects that use CI/CD with the service. The installation and registration process is detailed in the official GitLab documentation:
Install | runner register | runner group register | repo registered some of the group project operation
The registration above selects the registration group, which is all the items in a group of the entire GitLab. The main purpose is that there are too many projects here, and the single registration is too tedious (registration can only be achieved by logging in to runner server to execute commands).
Installation needs to pay attention to the place
The process on the website is quite detailed, but there are a few areas where you can make a few tips to avoid getting potholes
sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
Copy the code
This is the Linux version of the installation command, installation requires root (administrator) permission, followed by two parameters:
--user
是CI/CD
The user name used to execute the job (on which all subsequent processes are based)--working-directory
是CI/CD
Root directory path at execution timeMy personal experience with stomping is to set the directory on a large disk becauseCI/CD
A large number of files are generated, especially if usedCI/CD
Compile the TS file and cache the JS file generated by it; Such an operation would result ininnode
Deficiency creates some problems
–user means CI/CD execution is performed by using the user. Therefore, if you want to write scripts, it is recommended to write them when the user is logged in to avoid sudo su gitlab-runner without permission
What to look out for when registering
When executing according to the process on the official website, our tag is left blank, and no purpose has been found for the time being. Executor is more important, because we are from the manually deployed online or close to this way, so the safe way is to step by step, that is to say we choose the shell, the most common way to execute, and the impact of the project is relatively small (the official website example is given to Docker).
The.gitlab-ci.yml configuration file
With the above environment all installed, the next step is to get CI/CD to actually run and runner to specify which way to run. This configuration file is required to place the file in the root path of the repo repository by convention. When the file exists in the repository, the git push command will automatically execute the actions described in the configuration file.
- quick start
- configuration
The two links above are very complete and contain various configurable options.
In general, a configuration file is structured like this:
stages:
- stage1
- stage2
- stage3
job 1:
stage: stage1
script: echo job1
job 2:
stage: stage2
script: echo job2
job 3:
stage: stage2
script:
- echo job3-1
- echo job3-2
job 4:
stage: stage3
script: echo job4
Copy the code
Stages are used to declare valid stages that can be executed in the order in which they are declared. The name of job XXX below is not important, which is used in the GitLab CI/CD Pipeline interface. The important attribute is the stage attribute, which specifies the stage to which the current job belongs. Script is the content of the script that is executed. If you want to execute a multi-line command, write it like job 3.
If we replace stage, job, and so on with install_dependencies, test, eslint, and so on in our project, and then replace the value in the script field with something like NPX eslint, when you push the file to a remote server, Your project is already running these scripts automatically. In addition, you can see the execution status of each step in the Pipelines interface.
P.S. By default, the next stage is not executed until the previous stage has completed, but this can be modified with additional configuration: Allow Failure when
Set CI/CD to trigger only under specific circumstances
One problem with the above configuration file is that it is not specified in the configuration file which branch mentions trigger the CI/CD process, so by default commits on all branches are triggered, which is definitely not what we want. Execution of CI/CD is a drain on system resources, which is not worth the cost if the execution of some development branch affects the execution of the main branch.
So we need to define which branches trigger these processes, which is to use the only attribute in the configuration.
Use only to set CI/CD to trigger a branch. This operation is used to specify the branch (s) that trigger CI/CD.
Detailed configuration documents
job 1:
stage: stage1
script: echo job1
only:
- master
- dev
Copy the code
Individual configurations can be written this way, but if the number of jobs increases, it means that we need to repeat the same lines of code in the configuration file a lot, which is not a very nice thing to do. So a yamL syntax might be used here:
This is an optional step, just to reduce some duplication of code in the configuration file
.access_branch_template: &access_branch
only:
- master
- dev
job 1:
< < : *access_branch
stage: stage1
script: echo job1
job 2:
< < : *access_branch
stage: stage2
script: echo job2
Copy the code
An operation similar to template inheritance, which is not mentioned in the official documentation, is just a way to reduce redundant code and is optional.
Cache the necessary files
By default, CI/CD cleans the current working directory at each job to ensure that the working directory is clean and does not contain any data or files left by previous jobs. However, this creates a problem in our Node.js project. Because our ESLint, unit tests are executed based on various dependencies under node_modules. The current situation is equivalent to every step we need to execute NPM install, which is obviously an unnecessary waste.
This brings us to another option in the configuration file: cache
To specify that certain files and folders need to be cached and cannot be cleared:
cache:
key: ${CI_BUILD_REF_NAME}
paths:
- node_modules/
Copy the code
CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME: CI_BUILD_REF_NAME
Deployment project
If we put in a script for unit tests and ESLint based on some of the configuration above, it will do what we want, and if one step goes wrong, the task will stop there and not execute backwards. At this point, however, we have nothing left to do, so it’s time to take over the deployment step.
For deployment, we currently choose rsync to synchronize data from multiple servers, a relatively simple and efficient deployment method.
P.S. deployment requires one additional thing, which is to establish the machine trust relationship between the gitlab-runner user on the gitlab Runner machine and the user of the target deployment server. There are N ways to do this, the simplest of which is to execute ssh-copy-id on the Runner machine to write the public key to the target machine. Or you can take the public key of the Runner machine ahead of time, as I did, and write this string to the configuration file of the target machine when you need to establish a trust relationship with the machine. SSH 10.0.0.1 “echo \”XXX\” >> ~/. SSH /authorized_keys”
The general configuration is as follows:
variables:
DEPLOY_TO: /home/XXX/repo # Target server project path to deploy
deploy:
stage: deploy
script:
- rsync -e "ssh -o StrictHostKeyChecking=no" -arc --exclude-from="./exclude.list" --delete . 10.0. 01.:$DEPLOY_TO
- ssh 10.0. 01. "cd $DEPLOY_TO; npm i --only=production"
- ssh 10.0. 01. "pm2 start $DEPLOY_TO/pm2/$CI_ENVIRONMENT_NAME.json;"
Copy the code
We use variables. We use variables below.
SSH 10.0.0.1 “pm2 start $DEPLOY_TO/pm2/$CI_ENVIRONMENT_NAME;” This script is used to restart the service. We use pm2 to manage the process. The pm2 folder in the default contract project path holds the parameters required to start the environment.
Of course, what we are using at present is not so simple, which will be mentioned below
And at this stage of deployment, we have some extra processing
This is an important point because we might want more control over when we go live, so the deploy task is not automatically executed, we will change it to manual and it will be triggered, which uses another configuration parameter:
deploy:
stage: deploy
script: XXX
when: manual Set this task to run only when manually triggered
Copy the code
Of course, if you don’t need this option, you can remove it. For example, we did not configure this option in the test environment, but only in the online environment
Easier management of CI/CD processes
If you follow the above configuration file, you already have a CI/CD operation available that contains the entire process.
However, it is not very maintainable, especially if CI/CD is applied to multiple projects, making one change means that all projects need to be reconfigured and uploaded to the repository to take effect.
So we opted for a more flexible approach and ended up with a CI/CD configuration file that looked something like this (with some irrelevant configuration omitted) :
variables:
SCRIPTS_STORAGE: /home/gitlab-runner/runner-scripts
DEPLOY_TO: /home/XXX/repo # Target server project path to deploy
stages:
- install
- test
- build
- deploy_development
- deploy_production
install_dependencies:
stage: install
script: bash $SCRIPTS_STORAGE/install.sh
unit_test:
stage: test
script: bash $SCRIPTS_STORAGE/test.sh
eslint:
stage: test
script: bash $SCRIPTS_STORAGE/eslint.sh
# compile the TS file
build:
stage: build
script: bash $SCRIPTS_STORAGE/build.sh
deploy_development:
stage: deploy_development
script: bash $SCRIPTS_STORAGE/deploy.sh 10.0. 01.
only: dev # Specify the valid branch separately
deploy_production:
stage: deploy_production
script: bash $SCRIPTS_STORAGE/deploy.sh 10.0. 02.
only: master # Specify the valid branch separately
Copy the code
We put the scripts needed to execute each CI/CD step on runner’s server, and only the script file was executed in the configuration file. So when we have policy changes, ESLint rule changes, deployment patterns, etc. These are completely decoupled from projects, and most of the subsequent operations are not supported by projects that are already using CI/CD (some of which require the import of new environment variables do require project support).
Access nail notification
In fact, when CI/CD executes successfully or fails, we can see it on the Pipeline page and set up some email notifications, but these are not very timely. Since we are currently using nailing to communicate at work, we have developed a wave of nailing robots. I found that GitLab robot is supported, but its function is not applicable, it can only handle issues and so on. Some CI/CD notifications are missing, so I have to implement it based on the fixed message template.
Since we have encapsulated the operations of each step above, our colleagues are unaware of this modification. We only need to modify the corresponding script file and add the related operations of nails to complete the modification, encapsulating a simple function:
function sendDingText() {
local text="The $1"
curl -X POST "$DINGTALK_HOOKS_URL" \
-H 'Content-Type: application/json' \
-d '{ "msgtype": "text", "text": { "content": "'"$text"'"}}'
}
# Specifies the parameters passed in when sending
sendDingText "proj: $CI_PROJECT_NAME[$CI_JOB_NAME]\nenv: $CI_ENVIRONMENT_NAME\ndeploy success\n$CI_PIPELINE_URL\ncreated by: $GITLAB_USER_NAME\nmessage: $CI_COMMIT_MESSAGE"
It's up to you to customize whether you need more information when certain cases fail
sendDingText "error: $CI_PROJECT_NAME[$CI_JOB_NAME]\nenv: $CI_ENVIRONMENT_NAME"
Copy the code
All the environment variables used above are provided by GitLab Runenr except DINGTALK_HOOKS_URL which is our custom robot notification address.
The various variables can be found here:predefined variables
Rollback handling
Now that we’ve talked about normal procedures, it’s time to mention what to do when things go wrong. To err is human, there is a good chance that some places that have not been considered will lead to abnormal service, at this time, the first task is to make users can still access as usual, so we will choose to roll back to the last valid version. On the Pipeline or Enviroment page of a project, select the node you want to roll back and re-execute the CI/CD task. The rollback is complete.
This is a bit of a problem in TypeScript projects, however, because rolling back the deploy task from the previous version CI/CD is typically a re-execution. In the CASE of a TS project, we cache the dist folder in the Runner after TS converts JS. This folder is also pushed directly to the server at deployment time (the TS project source code is no longer pushed to the server).
If we just hit Retry it would cause a problem because our dist folder is cached and deploy doesn’t do anything about that, it just sends the corresponding push file to the server and restarts the service.
In fact, dist is still the JS file compiled last time, so there are two ways to solve this problem:
- in
deploy
Let’s do that beforebuild
- in
deploy
“
The first option is definitely not feasible, as it depends heavily on whether the people who are operating online know about the process. So we mainly solve this problem through the second solution.
We need to let the script know at execution time whether the content in the dist folder is what we want. So you need an id, and the easiest and most effective way to do that is to get a Git COMMIT ID. Each COMMIT has a unique identifier, and our CI/CD execution is dependent on new code being committed (which means there must be a COMMIT). So we cache the current COMMIT ID during build:
git rev-parse --short HEAD > git_version
Copy the code
Also add additional judgment logic to the deploy script:
currentVersion=`git rev-parse --short HEAD`
tagVersion=`touch git_version; cat git_version`
if [ "$currentVersion" = "$tagVersion" ]
then
echo "git version match"
else
echo "git version not match, rebuild dist"
bash ~/runner-scripts/build.sh # Extra execute build script
fi
Copy the code
This way, you avoid the risk of still deploying the wrong code when you roll back.
Here’s why you don’t merge the build step with deploy: Because we have many machines and jobs write many, like deploy_1, deploy_2, deploy_all, if we put this build step in deploy that means that every deploy, even if it’s a deployment, But because we chose to operate each machine separately, it would also be regenerated multiple times, which would incur additional time costs
Hot fix processing
After running CI/CD for a while, we found that occasionally solving online bugs was still slow because we had to wait for the full CI/CD process to complete after we submitted the code. So after some research, we decided that for certain hot fixes, we needed to skip ESLint and unit testing and quickly fix the code and get it online.
CI/CD provides different manipulation for certain tags, but I don’t want to do that for two reasons:
- This requires modifying the configuration file (for all projects)
- This requires the developer to be familiar with the corresponding rules (dozen
Tag
)
So we took a different approach, because our branches are only online with Merge requests, so their commit title is actually fixed: Merge branch ‘XXX’. CI/CD also has an environment variable that tells us the commit message that is currently executing CI/CD. We decide whether to skip these jobs by matching the string to check if some rule is met:
function checkHotFix() {
local count=`echo $CI_COMMIT_TITLE | grep -E "^Merge branch '(hot)? fix/\w+" | wc -l`
if [ $count -eq0]then
return 0
else
return 1
fi
}
# How to Use
checkHotFix
if[$?-eq0]then
echo "start eslint"
npx eslint --ext .js,.ts .
else
# Skip this step
echo "match hotfix, ignore eslint"
fi
Copy the code
This ensures that if our branch name is hotfix/XXX or fix/XXX is doing a code merge, CI/CD will skip the extra code checks and go straight to deployment online. The installation of dependencies is not skipped because these tools are still required for TS compilation
summary
More than half of the team’s projects now have access to the CI/CD process, and to facilitate the access of colleagues (mainly editing.gitlab-ci.yml files), we also provide a scaffold for the rapid generation of configuration files (including automatically establishing trust relationships between machines).
Compared to the previous, the deployment speed has been significantly improved, and there is no longer a dependence on the local network, as long as the code can be pushed to the remote warehouse, the subsequent things have nothing to do with their own, and can be convenient for small traffic online (deployment of a single validation).
And in the rollback aspect is more flexible, can quickly switch between multiple versions, and through the interface, the operation is more intuitive.
At the end of the day, the development pattern is actually tolerable without CI/CD, but when you use CI/CD and then go back to the previous deployment, you feel distinctly uncomfortable. (No comparison, no harm 😂)
Complete process description
- Install dependencies
- Code quality check
ESLint
check- Check whether the
hotfix
Branch, if so, skip this process
- Check whether the
- Unit testing
- Check whether the
hotfix
Branch, if so, skip this process
- Check whether the
- Compiling the TS file
- Deployment and launch
- Determining the current cache
dist
Is the directory a valid folder? If not, re-compile the TS file in step 3 - Send nail notification after online
- Determining the current cache
The next thing to do
Adding CI/CD is only the first step. After the deployment and online process is unified, other things can be done more easily. For example, you can verify the validity of the interface after the application goes online. If you find any errors, you can automatically roll back the version and redeploy it. Or docker, these adjustments are transparent to project maintainers to a certain extent.
The resources
- GitLab CI/CD
- Nail custom robot