There are more and more requirements for stability, and one of the maintained applications is in the process of migration. Some measures need to be taken to achieve smooth upgrade and migration. Grayscale publishing is a feasible scheme.

What is grayscale publishing

Baidu’s explanation goes like this

Grayscale publishing is a smooth transition between black and white. AB test is A grayscale release method, in which some users continue to use A and some users start to use B. If users have no objection to B, the scope will be gradually expanded and all users will be migrated to B. Grayscale publishing can ensure the stability of the whole system, and problems can be found and adjusted at the beginning of the grayscale to ensure its impact.

From the above explanation, it is not difficult to see that grayscale publishing includes three key points:

  1. Deploy version A and version B respectively
  2. Select the test user and cut the stream gradually
  3. Switching test users to version B does not affect the use of version A by policy users

The main role of grayscale publishing is also obvious:

  1. Reduce the impact of releases. Even with adequate testing in daily and pre-release environments, there is no guarantee that online will be good. The test environment is not exactly the same as online, and testing cannot cover everything
  2. Verify the effect of the new version by comparing the old version with the new version.

How to do gray release

Before we dive into grayscale publishing, let’s take a look at the publishing process without greyscale:

Direct replacement deployment

If the changes are compatible: The usual way to publish is for the underlying service to publish first, and then for the upper-layer application to follow. The front-end CDN is special and will be published in non-overwriting mode before the Web layer (THE CDN is essentially an OSS file, and the CDN version +1 is published each time and the previous version is retained). The Web layer is finally released with the latest VERSION of the CDN

Common applications usually introduce a configuration server to manage configuration items and make changes without republishing the application. For example, switch the CDN version number to achieve front-end publishing

Rolling deployment and blue-green deployment

Most applications now have multiple machines and load balancing. The publishing approach would be to release a few machines at a time, remove them from the load balancer and update them, and then the load balancer would re-access those machines. This is rolling deployment. Blue-green deployments are similar. The difference is that you upgrade one part, then load balancer to import traffic to the new service, and then upgrade another machine and reconnect to the load balancer.

Incompatible change

All of the above are compatible changes, but there are times when we need to do some breaking change, when both the front and back end are released at the same time, otherwise the service will be unavailable for a period of time. It would have been nice to ship the Web layer with the latest front-end code (or CDN version number) with the most primitive deployment. But with the configuration server in place, most scenarios are convenient, but not very friendly. The front-end release is independent from the back-end release by configuring the server. The two release conferences have a short period of service unavailability to achieve smooth release of incompatible changes. The common practice is to introduce the version number, upgrade some machines to the V2 version, and coexist /v1/ API and /v2/ API. The front-end application uses the new version/V2 / API interface. Remove code related to v1 version as appropriate after release.

Gray release classification

Combined with the above grayscale release definition and deployment mode when not doing grayscale release, for how to do grayscale release can basically imagine its context. The core is to upgrade some machines and direct some users to the new service based on certain conditions. Before introducing specific methods, the first overall classification: from the gray level, can be divided into physical gray level and logical gray level. The gray scale can be divided into functional gray scale and application gray scale.

Divide from gray scale

Physical grayscale is relatively simple, in fact, according to the machine dimension grayscale. As with the direct deployment mentioned above, after the application is deployed on the machine, the traffic is evenly transferred to the new and old versions. The good thing is that it’s easy and there’s no extra work to do. Disadvantages, as mentioned above, do not apply to incompatible changes

Compared with the physical gray scale, the point of change is only to change the uniform flow into switching flow according to a certain logic.

Logic gray scale can control flow more accurately, if there is a problem, rollback is also convenient, flow cut away is. For students who want to do further AB test, it is a must. But in this way, grayscale related codes often need to be cleared after the end of gray scale

if(Gray condition is established){gray service}else{Stable service}Copy the code

If grayscale services involve multiple function points, similar if and else will be used more often. If the grayscale condition needs to process some user information, there will be more code (for example, grayscale for a crowd will add additional crowd judgment logic). Deleting these codes after completion and releasing them is also a hassle. If they are removed until the next need, the inertia of people, the psychology of not wanting to do something but not to do anything wrong, the adjustment of maintenance personnel and other reasons may increase the difficulty of project maintenance. To alleviate this problem, we can separate out the logic of grayscale:

So, when the greyscale ends, you just delete the if else without greyscale logic. There is another, more difficult situation. In the grayscale process, what should we do if another feature needs to be published? Should we finish the grayscale of the previous feature, publish it in full or send it to grayscale at the same time? Feature is easy to handle, if there is an online issue, hotfix is urgently needed. These are all questions to ponder

Divide from gray scale

The grayscale of a function level is well understood as the grayscale of a single function. In the same way as mentioned above. The possible trouble point is that multiple functions may require grayscale at the same time, and the grayscale labels are inconsistent. The application level can be alleviated by extracting the gray condition SDK. The gray server can be identified by the version number v1 and v2. The front end is relatively special because it is dependent on the server side, although it is easy to have more than one version of the front end (no more than multiple CDN versions coexist). However, for the same release and the same grayscale association, the principle of “never put off till tomorrow what is done today”, there will be a betaVersion version in the front end of the release process, which is different from the official version version. First release betaVersion, and cut stream verification, verification is correct after the release of the official version, cut stream complete.

conclusion

Speaking so much, I believe that we are very clear about how to do gray release. We finally summarize, gray process can be divided into: develop gray strategy -> screen users -> deployment system -> observe gray -> increase/reduce gray scale proportion -> gray end -> delete gray code gray release has many benefits, but not without cost. But if we can adapt to local conditions to choose grayscale release, we can fully enjoy the benefits of grayscale and reduce the cost to a minimum

Refer to the article: common application release methods analyses: www.yunxiaobai.net/archives/99… Gray released: grey scale is very simple, release is very complex: www.woshipm.com/pmd/573429….