Author: Ten Sleep

Review & proofread: Wang Tao, Yi Zhan

Editing & Typesetting: Wen Yan

On November 11 this year, cloud native middleware completed the trinity of open source, self-research and commercialization, and comprehensively upgraded to middleware cloud products. MSE microservice governance supports the traffic peak of Alibaba Group’s core business Double 11 through Dubbo3.0. Up to now, 50% of users in the group have been accustomed to using MSE microservice governance HSF and Dubbo3.0 applications. Today we’ll take a closer look at the full-link grayscale capability in MSE Service Governance Professional and some of the scenarios it’s used in production at large scale.

background

Under the microservice architecture, some requirements are developed, which involves the simultaneous changes of multiple microservices on the microservice invocation link. It is necessary to better control the risk and explosion radius of the new version of service online through grayscale publishing. Every micro service usually gray environment or group to accept the gray flow, we hope that through into the gray environment upstream flow, also can enter the environment of downstream gray level, to ensure that a request in grayscale environment all the time, even if there is a slightly on this call link service no gray environment, these applications can still return to grayscale requests for downstream from environment. With the full-link grayscale capabilities provided by MSE, these capabilities can be easily implemented without modifying any of your business code.

MSE microservice governance full link gray scale features

As the fist function of MSE service governance professional edition, full-link gray scale has the following six features

  • You can introduce fine traffic by customizing rules

In addition to simply scaling traffic import, we also support Spring Cloud and Dubbo traffic import according to rules. Spring Cloud traffic can be imported based on cookie, header, PARam parameters or random percentage of requests. Dubbo traffic can be imported by service, method, and parameter.

  • Full link isolated traffic swimlane
  1. The desired traffic is “dyed” by setting traffic rules, and the “dyed” traffic is routed to the grayscale machine.

  2. Grayscale flow carries grayscale scale to the downstream to form a grayscale exclusive environmental flow swimlane. The application in a non-grayscale environment will choose the unmarked baseline environment by default.

  • An end-to-end stable baseline environment

The unmarked application belongs to the stable version of the baseline application, that is, the stable online environment. When we will publish the corresponding grayscale version of the code, we can then configure rules to direct the introduction of specific online traffic and control the risk of grayscale code.

  • Flow one-key dynamic flow cutting

After traffic rules are customized, you can stop, add, delete, modify, and check traffic rules with one click as required. The rules take effect in real time. Gray drainage is more convenient.

  • Low-cost access, implemented based on Java Agent technology without modifying a line of service code

MSE microservice governance capability is implemented based on Java Agent bytecode enhancement technology, which seamlessly supports all Spring Cloud and Dubbo versions available in the market for the past five years. Users can use it without changing a line of code, without changing the existing architecture of the business, and can go up and down at any time without binding. You only need to enable MSE Microservice Governance Professional edition to configure online and take effect in real time.

  • With lossless up and down the ability to make the release more silky

After MSE microservice management is enabled, applications can be connected online or offline without damage. In scenarios such as publishing, rolling back, capacity expansion, and capacity reduction under heavy traffic, traffic is not damaged.

Scenarios of mass production practices

This paper mainly introduces MSE microservice governance in the process of supporting key customers summarized and abstracted out several commonly used full-link grayscale scheme production practice scenarios.

Scenario 1: Automatically dye the traffic passing through the machine to achieve full-link gray scale

  • After entering a node with a tag, subsequent calls preferentially select nodes with the same tag, that is, “dye” the traffic passing through the tag node.

  • If no node with the same tag can be found on the invocation link with the tag, the node will be fallback to the untagged node.

  • A link with a tag passes through an untagged node. If the link calls a node with a tag, the tag mode is restored.

Scenario 2: Add a specific header to the traffic to achieve the full link gray scale

The client adds the identification of the specified environment in the request, and the access layer forwards it to the gateway representing the corresponding environment according to the representation. The gateway of the corresponding environment identifies the corresponding project isolation environment through the invocation of the isolation plug-in, and the request is closed loop in the business project isolation environment.

Scenario 3: Customize routing rules to perform full-link gray scale

By adding the specified header to the gray request and transmitting the header through the whole calling link, you only need to configure the routing rules related to the header in the corresponding application, and the gray request with the specified header will enter the gray machine, so that the gray level of the whole link traffic can be realized as required.

The practice of full link gray scale

How do we quickly obtain the above – mentioned full-link grayscale capability? Next, I will take you to quickly build our full-link gray scale capability from 0 to 1.

We assume that the application architecture consists of Ingress-Nginx and the back-end microservices architecture (Spring Cloud). The back-end call link has 3 hops, shopping cart (A), transaction center (B), inventory center (C), and they do service discovery through the Nacos registry. The client accesses the back-end service through the client or H5 page.

The premise condition

Install the Ingress-nginx component

Go to the Container Services console, open the application directory, search for ack-ingress-nginx, select the namespace kube-system, and click Create. You will see a Deployment ack-ingress-nginx-default-Controller in the kube-system namespace, indicating a successful installation.

$ kubectl get deployment -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE ack-ingress-nginx-default-controller 2/2 2 2  18hCopy the code

Start MSE Microservice Governance Professional

  • Click Open MSE Microservice Governance Pro to use full link grayscale capability.

  • Go to the Container Services console, open the application directory, search for Ack-mSE-Pilot, and click Create.

  • In the MSE service governance console, open the K8s cluster list, select the corresponding cluster, corresponding namespace, and enable microservice governance.

Deploy the Demo application

Yaml and execute kubectl apply-f ingress-gray.yaml to deploy the application. Here we will deploy three applications A, B, and C, each with A baseline version and A grayscale version.

ApiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-a name: apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-a name: spring-cloud-a spec: replicas: 2 selector: matchLabels: app: spring-cloud-a template: metadata: annotations: msePilotCreateAppName: spring-cloud-a labels: app: spring-cloud-a spec: containers: - env: - name: LANG value: C. utf-8-name: JAVA_HOME value: /usr/lib/jvm/java-1.8-openJDK /jre image Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT imagePullPolicy: Always name: spring-cloud-a ports: - containerPort: 20001 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: TcpSocket: port: 20001 initialDelaySeconds: 10 periodSeconds: 30 # A Application GRAY version -- apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-a-new name: spring-cloud-a-new spec: replicas: 2 selector: matchLabels: app: spring-cloud-a-new strategy: template: metadata: annotations: alicloud.service.tag: gray msePilotCreateAppName: spring-cloud-a labels: app: spring-cloud-a-new spec: containers: - env: - name: LANG value: C.UTF-8 - name: JAVA_HOME value: / usr/lib/JVM/Java - 1.8 - its/jre - name: profiler. Micro. Service. The tag. The trace. The enable value: "true" image: Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT imagePullPolicy: Always name: spring-cloud-a-new ports: - containerPort: 20001 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: tcpSocket: port: 20001 initialDelaySeconds: 10 periodSeconds: 30 # B Application Base version -- apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-b name: spring-cloud-b spec: replicas: 2 selector: matchLabels: app: spring-cloud-b strategy: template: metadata: annotations: msePilotCreateAppName: spring-cloud-b labels: app: spring-cloud-b spec: containers: - env: - name: LANG value: C.UTF-8 - name: JAVA_HOME value: / usr/lib/JVM/Java - 1.8 - its/jre image: Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT imagePullPolicy: Always name: spring-cloud-b ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: TcpSocket: port: 20002 initialDelaySeconds: 10 periodSeconds: 30 # B Application Gray version -- apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-b-new name: spring-cloud-b-new spec: replicas: 2 selector: matchLabels: app: spring-cloud-b-new template: metadata: annotations: alicloud.service.tag: gray msePilotCreateAppName: spring-cloud-b labels: app: spring-cloud-b-new spec: containers: - env: - name: LANG value: C.UTF-8 - name: JAVA_HOME value: / usr/lib/JVM/Java - 1.8 - its/jre image: Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT imagePullPolicy: Always name: spring-cloud-b-new ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: tcpSocket: port: 20002 initialDelaySeconds: 10 periodSeconds: 30 # C Base version of the application -- apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-c name: spring-cloud-c spec: replicas: 2 selector: matchLabels: app: spring-cloud-c template: metadata: annotations: msePilotCreateAppName: spring-cloud-c labels: app: spring-cloud-c spec: containers: - env: - name: LANG value: C.UTF-8 - name: JAVA_HOME value: The/usr/lib/JVM/Java - 1.8 - its/jre image: Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT imagePullPolicy: Always name: spring-cloud-c ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: TcpSocket: port: 20003 initialDelaySeconds: 10 periodSeconds: 30 # C Application Gray version -- apiVersion: apps/v1 kind: Deployment metadata: labels: app: spring-cloud-c-new name: spring-cloud-c-new spec: replicas: 2 selector: matchLabels: app: spring-cloud-c-new template: metadata: annotations: alicloud.service.tag: gray msePilotCreateAppName: spring-cloud-c labels: app: spring-cloud-c-new spec: containers: - env: - name: LANG value: C.UTF-8 - name: JAVA_HOME value: / usr/lib/JVM/Java - 1.8 - its/jre image: Registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT imagePullPolicy: IfNotPresent name: spring-cloud-c-new ports: - containerPort: 8080 protocol: TCP resources: requests: cpu: 250m memory: 512Mi livenessProbe: tcpSocket: port: 20003 initialDelaySeconds: 10 periodSeconds: 30 # Nacos Server --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: nacos-server name: nacos-server spec: replicas: 1 selector: matchLabels: app: nacos-server template: metadata: labels: app: nacos-server spec: containers: - env: - name: MODE value: standalone image: nacos/nacos-server:latest imagePullPolicy: Always name: nacos-server resources: requests: CPU: 250M Memory: 512Mi dnsPolicy: ClusterFirst restartPolicy: Always # Nacos Server Service configuration -- apiVersion: v1 kind: Service metadata: name: nacos-server spec: ports: - port: 8848 protocol: TCP targetPort: 8848 selector: app: nacos-server type: ClusterIPCopy the code

Hands-on practice

Scenario 1: Automatically dye the traffic passing through the machine to achieve full-link gray scale

Sometimes, we can distinguish the online baseline environment from the grayscale environment by different domain names. Grayscale environment has a separate domain name that can be configured. Suppose we request the grayscale environment by visiting www.gray.com and visit www.base.com to go to the baseline environment.

Call link ingress-nginx -> A -> B -> C, where A can be A spring-boot application.

Note: When applying A’s Gray and A’s Base environment, you need to turn on the switch of A’s application transparent transmission by traffic ratio in the MSE service management console, which indicates that the function of transmitting the labels of the current environment backwards is enabled. When ingress-nginx routes A gray, even if the request does not carry any header, the x-mse-tag:gray header will be automatically added to subsequent calls because this switch is enabled. The header value gray comes from the label information configured by APPLICATION A. If the original request has x-MSE-tag :gray, the original request’s tag takes precedence.

For entry application A, two K8S services are configured. Spring-cloud-a-base corresponds to the Base version of A, and spring-cloud-a-Gray corresponds to the Gray version of A.

apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-base
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a

---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-gray
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a-new
Copy the code

Configure the Ingress rule for an entry. Access www.base.com to route to the Base version of application A and www.gray.com to route to the Gray version of application A.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-base
spec:
  rules:
  - host: www.base.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-base
          servicePort: 20001
        path: /

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-gray
spec:
  rules:
  - host: www.gray.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-gray
          servicePort: 20001
        path: /

Copy the code
results

At this point, access www.base.com and route to the baseline environment

Curl -h "Host:www.base.com" http://106.14.155.223/a A[172.18.144.155] -> B[172.18.144.120] -> C[172.18.144.79]Copy the code

At this point, access www.gray.com and route to the grayscale environment

Curl -h "Host:www.gray.com" http://106.14.155.223/a Agray[172.18.144.160] -> Bgray[172.18.144.57] -> Cgray [172.18.144.157]Copy the code

Further, if entry application A does not have A grayscale environment, accesses the base environment of A, and needs to enter the grayscale environment when A -> B, this can be achieved by adding A special header X-mSE-tag. The value of the header is the tag of the desired environment. For example, gray.

Curl -h "Host:www.base.com" -h "x-mSE-tag :gray" http://106.14.155.223/a A[172.18.144.155] -> Bgray[172.18.144.139] -> Cgray [172.18.144.8]Copy the code

It can be seen that the first hop enters the base environment of A, but when A->B, it returns to the grayscale environment.

The advantage of this way of use is that the configuration is simple, only need to configure the rules in Ingress, an application needs to be published in the grayscale environment, only need to deploy the application in the grayscale environment, gray flow will naturally enter the grayscale machine, if the verification is no problem, the grayscale image will be published to the baseline environment; If multiple applications need to be grayscale published at a time, add them all to the grayscale environment.

Best practices
  • All applications in grayscale environments are marked gray. Applications in baseline environments are not marked by default.

  • Online normal drainage 2% of the flow into the gray environment

 

Scenario 2: Add a specific header to the traffic to achieve the full link gray scale

Some clients cannot rewrite the domain name and want to access www.demo.com by passing in a different header to route to the grayscale environment. For example, in the figure below, access the grayscale environment by adding the x-mSE-Tag: Gray header.

The Ingress of the rules for the demo at this time is as follows, please notice here increased the nginx. Ingress. Kubernetes. IO/multiple rules related to canary

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-base
spec:
  rules:
  - host: www.demo.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-base
          servicePort: 20001
        path: /
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-gray
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "x-mse-tag"
    nginx.ingress.kubernetes.io/canary-by-header-value: "gray"
    nginx.ingress.kubernetes.io/canary-weight: "0"
spec:
  rules:
  - host: www.base.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-gray
          servicePort: 20001
        path: /
Copy the code
results

At this point, access www.demo.com and route to the baseline environment

Curl -h "Host:www.demo.com" http://106.14.155.223/a A[172.18.144.155] -> B[172.18.144.56] -> C[172.18.144.156]Copy the code

How do I access the grayscale environment? Simply add a header X-mSE-tag :gray to the request.

Curl -h "Host:www.demo.com" -h "X-mSE-tag :gray" http://106.14.155.223/a Agray[172.18.144.82] -> Bgray[172.18.144.57] -> Cgray [172.18.144.8]Copy the code

You can see that the Ingress routes directly to A’s Gray environment based on this header.

further

Ingress can also be used to implement more complex routing, such as when a client already carries a header and wants to use the existing header instead of adding a new one, as shown in the figure below. Suppose we want the request with x-user-ID 100 to go into a grayscale environment.

Just add these four rules:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-base
spec:
  rules:
  - host: www.demo.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-base
          servicePort: 20001
        path: /
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-base-gray
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "x-user-id"
    nginx.ingress.kubernetes.io/canary-by-header-value: "100"
    nginx.ingress.kubernetes.io/canary-weight: "0"
spec:
  rules:
  - host: www.demo.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-gray
          servicePort: 20001
        path: /
Copy the code

Access with a special header, meet the conditions into the gray environment

Curl -h "Host:www.demo.com" -h "x-user-id:100" http://106.14.155.223/a Agray[172.18.144.93] -> Bgray[172.18.144.24] -> Cgray [172.18.144.25]Copy the code

Requests that do not meet the criteria enter the baseline environment:

Curl "Host:www.demo.com" - H - H "x - the user - id: 101" http://106.14.155.223/a [172.18.144.22] A [172.18.144.91] - > B - > C [172.18.144.95]Copy the code

Compared with scenario 1, the domain name of the client remains the same and only needs to be distinguished by request.

Scenario 3: Customize routing rules to perform full-link gray scale

Sometimes we don’t want automatic passthrough and automatic routing, but for service invocation chain upstream and downstream of each application can custom gray rules, such as application B want to control only to satisfy the request of the custom rules will be routed to the application here, B and C application might want to define a and B different grayscale rules, at this time should be how to configure? See the following figure for the scenario:

Note that it is best to remove the parameters configured in scenarios 1 and 2.

As A first step, we need to add an environment variable to entry application A (preferably all entry applications, including Gray and base) : Alicloud.service. header=x-user-id. X-user-id is the header that requires transparent transmission. It identifies the header and performs automatic transparent transmission.

Note that x-mSE-tag is not used here; it is a header by default and has special logic.

Second, in the middle APPLICATION B, configure the label routing rules on the MSE console

Step 3 configure routing rules on the Ingress. For this step, refer to Scenario 2 and use the following configuration:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: spring-cloud-a-base
spec:
  rules:
  - host: www.base.com
    http:
      paths:
      - backend:
          serviceName: spring-cloud-a-base
          servicePort: 20001
        path: /
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: 'true'
    nginx.ingress.kubernetes.io/canary-by-header: x-user-id
    nginx.ingress.kubernetes.io/canary-by-header-value: '100'
    nginx.ingress.kubernetes.io/canary-weight: '0'
  name: spring-cloud-a-gray
spec:
  rules:
    - host: www.base.com
      http:
        paths:
          - backend:
              serviceName: spring-cloud-a-gray
              servicePort: 20001
            path: /
Copy the code
results

Test and verify, access the grayscale environment, with the header meeting the conditions, route to the grayscale environment of B.

Curl 120.77.215.62/ a-h "Host: www.base.com" -h "x-user-id: 100" Agray[192.168.86.42] -> Bgray[192.168.74.4] -> C[192.168.86.33]Copy the code

Access the grayscale environment, with the unqualified header, route to the base environment of B.

Curl 120.77.215.62/ a-h "Host: www.base.com" -h "x-user-id: 101 "A [192.168.86.35] - [192.168.73.249] - > B > C [192.168.86.33]Copy the code

Delete the Ingress Canary configuration, access the Base A service (alicloud.service.header environment variable is required for the baseline environment entry application), and route to the grayscale environment of B with the headers that meet the requirements.

Curl 120.77.215.62/ a-h "Host: www.base.com" -h "x-user-id: 100 "A [192.168.86.35] - > Bgray [192.168.74.4] - > [192.168.86.33] CCopy the code

Access the Base environment, carry the header that does not meet the requirements, and route to the base environment of B.

Curl 120.77.215.62/ a-h "Host: www.base.com" -h "x-user-id: 101 "A [192.168.86.35] - [192.168.73.249] - > B > C [192.168.86.33]Copy the code

conclusion

20 minutes of fast practice with great technical difficulty of the full link grayscale ability, the full link grayscale is not so difficult!

Based on MSE service governance full-link grayscale capability, we can quickly implement enterprise-level full-link grayscale capability. The above three scenarios are standard scenarios for large-scale implementation in production practice. Of course, we can customize and adapt according to our own business based on MSE service governance capability. Even in the context of multiple traffic sources, accurate traffic diversion can be customized according to services.

At the same time, the observability of MSE service Management professional edition makes the validity of gray scale measurable.

Gray flow second level monitoring

Standardize the release process

In daily releases, we often get the following wrong ideas:

  • The content of this change is relatively small, and the online requirements are urgent, it is not necessary to test the release of the online.

  • Release does not need to go gray process, quickly release online.

  • Gray release has no use, is a process, released directly on the release of online, do not wait for observation.

  • Although grayscale release is very important, it is difficult to build grayscale environment, and the time and energy consumption priority is not high.

All of these ideas can lead to a bad release, and many failures are caused directly or indirectly by the release. Therefore, improving the quality of the release and reducing the occurrence of errors is a key link to effectively reduce online failures. To secure publishing, we need to standardize the publishing process.

The tail

With the popularity of microservices, more and more companies use microservices framework. With its high cohesion, low coupling and other characteristics, microservices provide better fault tolerance, and are more suitable for rapid business iteration, bringing a lot of convenience to developers. However, with the development of business, the separation of microservices is more and more complex, and the governance of microservices has become a headache.

Taking the whole link gray scale alone, in order to ensure the function correctness verification before the new version of the application goes online, we also need to take into account the efficiency of application release. If the scale of our application is small, we can directly maintain multiple sets of environments to ensure the correctness of application release. However, when our business grows to a large and complex level, assuming that our system consists of 100 microservices, even if the test/grayscale environment takes up 1 to 2 pods per service, we need to face huge cost and efficiency challenges brought by operation and maintenance environment in so many sets of environments.

Is there a simpler and more efficient way to solve the problem of microservice governance?

MSE Microservice Engine will launch service Governance Professional edition, providing a complete and professional microservice governance solution out of the box to help enterprises better realize their microservice governance capabilities. If your system can also quickly have a complete full-link gray scale capability as described in this paper, and further micro-service governance practices based on this capability, not only can save objective labor and cost, but also can make your enterprise more confident in the exploration of micro-service field.

For more information, please scan the QR code below or search wechat (AlibabaCloud888) to add cloud native assistant! For more information!