Abstract

The term DevOps originates from the combination of Development and Operations, which means that the Development and test operation and maintenance links in the software delivery process are connected through the tool chain. Through automated testing and monitoring, the team can reduce time loss and deliver products more efficiently and stably.

This article will focus on the capabilities that DevOps needs to provide in the continuous integration phase, and will give a brief introduction to workflow design and pipeline optimization ideas.

As projects get bigger and bigger, features and maintainers become more and more numerous, and the conflict between feature delivery frequency and software quality becomes more and more acute, how to balance the two becomes an urgent focus of the current team, so the implementation of a complete DevOps tool chain is on the agenda.

We believe that from code integration and functional testing to deployment and release and infrastructure architecture management, there should be comprehensive and comprehensive automated monitoring means, and human intervention should be avoided as much as possible. Only in this way can the software balance quality and efficiency, and ensure reliability while increasing the frequency of release. This is the ultimate goal of every successful large-scale project.

This article will focus on the capabilities that DevOps needs to provide in the continuous integration phase, and will give a brief introduction to workflow design and pipeline optimization ideas.

When we talk about CI, what are we talking about

Continuous Integration (CI) refers to the act of integrating code into the trunk frequently (multiple times a day).

Note that this includes both the implication of continuous code integration into the trunk and the process of continuous source code generation into artifacts for actual use. Therefore, we need to automate the quality assurance of the code through CI and transform its build artifacts to produce usable artifacts for the next phase.

Therefore, at the CI stage, we need to achieve at least the following stages:

  1. Static code inspection

This includes ESLINT/TSLINT static syntax checks to verify that git commit messages comply with the specification, and commit files have owners that can be reviewed. These static checks can be done by scanning the source code without a compile process.

  1. Unit testing/Integration testing /E2E testing

Automatic testing is the key to ensure the quality of products. Test case coverage and use case quality directly determine the quality of the build product, so comprehensive and complete test cases are also essential for continuous delivery.

  1. Compile and collate the artifacts

In small to medium-sized projects, this step is often omitted and the build artifacts are handed over to deployment. But for large projects, multiple, frequent commit builds can result in a large number of build artifacts that need to be managed properly. Product-to-product creation we’ll talk more about later.

Integrated workflow design

Before the formal access to CI, we need to plan a new workflow to adapt to the problems and difficulties that may be brought by the project switching to high-frequency integration. There are many levels of transformation involved, and in addition to urging developers to change their habits and training new processes, our main concern is the way in which updates to the source repository trigger the continuous integration step.

The organization of an assembly line

We need a proper organization to manage what tasks a CI pipeline should perform at what stages.

There are a lot of CI tools on the market to choose from. After careful observation, you will find that both the emerging lightweight tools like Drone and the old Jenkins support such a feature in the way of native or plug-in: ConfigurationasCode, which uses configuration files to manage pipelining.

The benefits of this are considerable. First, it eliminates the need for a Web page dedicated to pipelining, which reduces maintenance costs for the platform side. Secondly, for users, the pipeline configuration is integrated into the source repository, and the source code can be upgraded synchronically, so that the CI process can also be standardized and audit traceability using git version management.

Once the pipeline organization is established, we also need to consider the release pattern of the version and the branch strategy of the source repository, which directly determines how we plan the pipeline for code integration.

Release model trade-off

As mentioned in continuous Delivery 2.0, the release model has three elements: delivery time, number of features, and quality of delivery.

There are checks and balances. In the case of relatively fixed development manpower and resources, we can only guarantee two of them.

The traditional project-based release model sacrifices delivery time and waits for all features to be fully developed and fully manual tested before releasing a new release. However, this can lead to a longer lead time and, due to the large number of features, a higher risk of uncontrollable development, which may lead to the release not being delivered on time. Does not meet the continuous delivery requirements of a mature large-scale project.

With the idea of continuous integration, when our integration frequency is high enough and the automated testing is mature and stable enough, it is not necessary to dump all the features in a single release. Every time a feature is developed, it is automatically tested, and when it is finished, it is closed and ready for release. The next step is to automatically publish features that are already stable in wait at a specific time period node. This is an optimal solution for modern large projects with increasingly frequent and shorter release cycles.

Branching strategy

Like most teams, our original development mode is branch development, the idea of trunk release, and branch strategy adopts the most mature and perfect Git-flow mode in the industry.

As you can see, this pattern is already in place for feature development, bug fixes, release releases, and even hotfixes, and is a workflow that can be applied to production environments. But the overall structure has therefore become extremely complex and difficult to manage. For example, to make a hotfix, pull the Hotfix branch out of the trunk used before the latest release, fix it and fold it into the Develop branch, wait for the next release to pull it out into the Release branch, and then close the trunk back when the release is complete.

In addition, there is no strict merging time for each feature branch of Git-flow. Therefore, for large demands, the merging time interval may be very long. In this way, there may be a lot of conflicts to be solved during the merging process, resulting in unreasonable extension of project duration. In this regard, students who do large-scale transformation and reconstruction should have a deep experience.

In view of this, we decided to boldly adopt the branch strategy of trunk development and trunk release.

We require that members of the development team try to commit their branch code to the trunk on a daily basis. When the publication condition is reached, the publication branch is pulled directly from the trunk for publication. If defects are found, fix them directly in the trunk and cherry pick to the release branch of the corresponding version as needed.

As a result, there are only two branches that developers need to pay attention to: the trunk branch and their working branch. Only push and merge git commands can complete all branch operations. At the same time, due to the increase in the frequency of merge, the number of conflicts per person to resolve is greatly reduced, which undoubtedly solves the pain points of many developers.

To be clear, there is no silver bullet in branching strategy and release model. The strategy we adopted may not be appropriate for every team’s project. Increasing the frequency of joins as quickly as possible allows the product to iterate quickly, but it undoubtedly makes it difficult for new features to be fully manually tested and verified.

To address this paradox, there needs to be strong infrastructure and long-term habit building behind it. There are several types of difficulties that you can consider to determine whether a trunk development approach is necessary.

  1. Complete and fast automated testing. Only when the coverage of unit test, integration test and E2E test is very high, and the quality of test cases obtained through variation test is high, the overall quality of the project can be guaranteed. But this requires all the developers on the team to get used to TDD (Test-driven development), which is quite a long engineering culture.

  2. Code Review mechanism of Owner responsibility. Giving developers Owner awareness and line-by-line review of the modules they are responsible for can avoid many disruptive changes and potholes in the design architecture when code changes are made. Essentially the difficulty is the habit cultivation of developers.

  3. A lot of infrastructure spending. High-frequency automated testing is a very resource-intensive operation, especially E2E testing, where each test case is supported by a headless browser. In addition, to improve the efficiency of testing, multi-core machines are needed for parallel execution. Each of these is a big resource investment.

  4. Fast and stable rollback ability and accurate online and gray monitoring and so on. Only in the case of highly automated full-link monitoring, new versions released under this mechanism can be guaranteed to run stably. This construction will be described in detail in a future article.

Creation of products -> products in large projects

For most projects, after the code is compiled to produce the artifacts, the project is deployed by logging into the publisher and pasting the artifacts into the publisher each time. The generated static files can be stored at the same time due to different hashes, and the HTML is updated by direct overwriting.

It is not convenient to audit and trace the update history, and it is difficult to guarantee the correctness of such changes.

In addition, when we need to roll back the version, since there is no HTML for the historical version on the server, the way to roll back is to recompile the packaged version of the historical version and override it. This speed of rollback is clearly not satisfactory.

One solution is not to make any overwrite updates to the files, and all artifacts should be uploaded to persistent storage. We can add a traffic distribution service upstream of the request to determine which version of THE HTML file should be returned for each request.

For large projects, the HTML files returned are not always the same. It may be injected with channel, user-defined and other identifiers, as well as the first screen data required by SSR to change its code form. Therefore, we think that the artifact provider of the HTML file should be a separate dynamic service that replaces the template HTML with some logic and then outputs it.

To summarize, after each build, the artifact is cleaned up as follows to produce the final front-end artifact:

  1. Static files, such as CSS, JS and other resources will be published to the cloud object storage, and the CDN will be synchronized as the source to optimize the access speed.

  2. For HTML products, a direct outbound service is required to be supported and packaged into a Docker image, which is of the same level as the back-end micro-service image, for upstream traffic distribution service (gateway) to choose which service loads to consume according to user requests.

Speed is efficiency, assembly line optimization idea

The internal design can be complex for a good tool, but it must be simple and user-friendly for users.

In the context of continuous integration with such high frequency as trunk development, integration speed is efficiency, and pipeline execution time is undoubtedly the most important concern of developers, as well as the decisive indicator of pipeline usability. There are several things we can do to improve pipeline execution efficiency and reduce developer wait time.

Pipeline task scheduling

For tasks that need to be executed at each stage of the pipeline, we need to follow certain arrangement principles: tasks without prepositions are given priority, tasks with short execution time are given priority, and unrelated tasks are parallel.

According to this principle, we can analyze each task executed in the pipeline and do a shortest path dependency analysis for each task, and finally get the earliest execution time of the task.

Use opportunely Docker Cache

Docker provides such a feature: during the construction process of Docker image, each executable statement of Dockerfile will build a new image layer and cache it. During the second build, Docker will check its cache item by item in the unit of image layer. If the same image layer is hit, the cache will be reused directly, which greatly reduces the time of repeated build.

We can make use of this feature of Docker to reduce the steps that are usually repeated in the pipeline, thus improving the execution efficiency of CI.

For example, NPM install, which is usually the most time-consuming dependency installation in a front-end project, changes dependencies are actually a low probability event for high frequency integration, so we can package the node_modules folder as an image for the next build. Dockerfile example write as follows:

FROM node:12 AS dependencies
WORKDIR /ci
COPY . .
RUN npm install
ENV NODE_PATH=/ci/node_modules
Copy the code

We added a policy to the pipeline to check for cache hits: before the next compilation, look to see if the mirror cache exists. Also, to ensure that the dependencies of this build are not updated, we must check whether the build matches the MD5 code of the package-lock.json file in the image cache. If they are inconsistent, re-install the dependency and package the new image for caching. If the comparison results are consistent, the node_modules folder is fetched directly from the image, saving a lot of dependent installation time.

An example of pipelining to pull a mirror folder is as follows, where –from is followed by the alias of the previous cache build image:

COPY --from=dependencies node_modules/.# Perform other stepsCopy the code

Similarly, we can extend this feature to all tasks with low update frequency and long generation time during CI. For example, the environment-dependent installation in Linux, the cache before each use case is run for unit test, and even the replication of folders with a large number of static files can be almost skipped by using the feature of Docker cache to reduce the effect of integration time. As the principle is roughly the same, I will not repeat it here.

hierarchical

It is well known that the execution time of an assembly line must slow down as the number of tasks increases. In large projects, the number of test cases increases as metric calculations are added, and sooner or later the running time reaches the point where we can’t stand it.

However, the number of test cases to a certain extent determines the quality of our project, quality inspection must not be less. Is there a way to reduce the time developers have to wait for integration while maintaining project quality? The answer is a hierarchical build.

A hierarchical build is a CI pipeline that is split into a primary build and a secondary build, where the primary build needs to be executed every time the code is submitted and no further action can be taken without passing the check. The secondary build does not block the workflow and continues to execute after the code is plugged in by bypassing it. However, if the secondary build validation fails, the pipeline will immediately issue a notification alarm and block all other code entry until the problem is fixed.

There are a few guidelines for whether a task should go into the secondary build process:

  1. Secondary builds will contain tasks that take a long time to execute (e.g., more than 15 minutes) and consume a lot of resources, such as E2E tests in automated tests.

  2. Secondary builds should include tasks with low use-case priority or error probability, and try not to include critical links. If some test cases in automated testing are found to have a high number of failures, consider adding functional unit tests and moving them into the build process.

  3. If the secondary build is still too long, consider an appropriate way to split test cases and test in parallel.

conclusion

To do a good job, he must sharpen his tools. Tencent document project high frequency and stable release behind, must need to have a strong infrastructure support.

This article only introduces the transformation of the project in the continuous integration stage. The specific transformation ideas in the continuous deployment and continuous operation stages will be explained in detail in the following articles. We also welcome you to discuss more, and put forward suggestions and corrections for the parts that need to be improved or are wrong.

The resources

  1. Continuous Delivery 2.0. By Qiao Liang
  2. www.redhat.com/zh/topics/d…
  3. www.36kr.com/p/121837544…

About us

More about cloud native cases and knowledge, can pay attention to the same name [Tencent cloud native] public account ~

Benefits:

① Public account background reply [Manual], you can get “Tencent Cloud native Roadmap manual” & “Tencent Cloud native Best Practices” ~

② Public number background reply [series], can get “15 series of 100+ ultra practical cloud original dry goods collection”, including Kubernetes cost reduction and efficiency, K8s performance optimization practices, best practices and other series.

③ Public account background reply [white paper], you can get “Tencent Cloud container Security White Paper” & “Source of Cost reduction – Cloud native Cost Management White Paper V1.0”