15 minutes to achieve enterprise applications lossless online

Many application systems with a large number of users and a high degree of concurrency generally choose to release in the middle of the night when the traffic is small in order to avoid loss of traffic during the release process. Although this is effective, the r&d operation and maintenance cost behind it is not controllable, which is a great burden for enterprises. Based on this, ali cloud service engine MSE in the process of application, by applying the adaptive for + active advise when offline, online application ready when inspection with micro service life cycle alignment + service such as preheating technology means the service provided by the application condition and line function, can effectively help enterprises to avoid the online publishing of traffic information loss.

Lossless up and down line function design

Common traffic loss causes include but are not limited to the following: • Services fail to be offline in a timely manner: Service consumers perceive that the service list in the registry is delayed. As a result, service consumers still invoke the offline application for a period of time, resulting in request errors. • Slow initialization: The application starts to load resources initially due to heavy traffic on the receiving line. As a result, the initialization process is slow. A large number of requests and responses time out, block, and resources are exhausted, causing the application to break down. • Registering too early: The service has an asynchronous resource loading problem. The service is registered with the registry before it is fully initialized. As a result, the request response is slow before the resource loading is complete and an error occurs when the invocation times out. • Release state is not aligned with running state: Use Kubernetes rolling publishing function for application publishing, because Kubernetes rolling publishing generally associated with the ready checking mechanism, is to check whether the application specific port is started as a sign of application ready to trigger the next batch of instance publishing, However, in microservice applications, service invocation can only be provided when the application completes service registration. Therefore, in some cases, the new application is not registered in the registry, and the old application instance is taken offline, resulting in no service available.

Nondestructive offline

The service cannot be offline in time, as shown in Figure 1 below:

Figure 1. Spring Cloud application consumers are unable to sense when provider services go offline

For Spring Cloud applications, when two instances of the application, A ‘and A of A, go offline, due to the Spring Cloud framework to balance availability and performance, the default consumer is 30 seconds to go to the registry to pull the latest list of services. Therefore, the offline instance of A cannot be sensed in real time. If consumers continue to call A through the local cache, traffic loss will occur when they call the offline instance.

To solve this problem, the lossless offline function designed and implemented by Ali Cloud microservice engine MSE based on Java Agent bytecode technology is shown in Figure 2 below:

Figure 2. Lossless logoff scheme

In this lossless lossless lossless lossless lossless solution, the service provider application only needs to access MSE. There will be an adaptive waiting period before the application goes offline, during which it is expected that the offline application will send offline events to service consumers who have sent requests during the adaptive waiting period through active notification. After receiving the offline event, consumers will actively pull the registry service instance list to sense the application offline event in real time and avoid the loss of application offline traffic caused by invoking the offline instance.

Nondestructive on-line

Lazy loading is one of the most common strategies in software framework design. For example, in the Spring Cloud framework, the Ribbon component’s initialization time for the pull service list is the first call of the service by default. For example, in the Spring Cloud application, figure 3 shows the first and second requests to invoke the remote service from the RestTemplate:

Figure 3. Application startup resource initialization versus normal operation time

As you can see from the test results, the first call takes several times more time than normal due to some resource initialization. Therefore, when a new application is released online to process heavy traffic, slow response to a large number of requests, resource congestion, and application instances break down may occur. To solve the problem of slow initialization of application resources under heavy traffic, the low-traffic preheating function provided by MSE protects new instances by adjusting the traffic allocated to newly online applications before processing normal traffic. The low-flow preheating process is shown in Figure 4 below:

FIG. 4. The relationship between QPS and startup time in the warm-up process of small-flow service

In addition to the lossy online problem caused by slow initialization of the first call of the above application, MSE also provides a set of non-destructive on-line means to meet the requirements of non-destructive on-line of various applications, such as resource pre-connection, delayed registration, service registration before Kubernetes readiness check is passed, and service warm-up before Kubernetes readiness check is passed, as shown in Figure 5:

Figure 5. MSE lossless on-line scheme

How to use MSE lossless unwinding

Next, the best practices of lossless offline and service warm-up capabilities provided by Ali Cloud microservice engine MSE at application launch will be demonstrated. Assume that the application architecture consists of a Zuul gateway and a back-end microservice application instance (Spring Cloud). The specific back-end call links include shopping cart application A, transaction center application B, and inventory center application C. Services in these applications are registered and discovered through the Nacos registry.

The premise condition

Enable MSE microservice management

• A Kubernetes cluster has been created. For details, see Creating a Managed Kubernetes Cluster [1]. • MSE Microservice Governance Professional has been opened. For details, see Enabling MSE Microservice Governance [2].

The preparatory work

Note that the Agent used in this practice is still in gray scale, and the application Agent needs to be upgraded in gray scale. The upgrade document is help.aliyun.com/document_de…

To deploy applications in different regions (only domestic regions are supported), use the corresponding Agent download address: http://arms-apm-cn-[regionId]. Oss-cn -[regionId].aliyuncs.com/2.7.1.3-mse-beta/, pay attention to replace the address of [regionId], RegionId is ali Cloud RegionId,

For example Region Agent address is: Beijing arms-apm-cn-beijing.oss-cn-beijing.aliyuncs.com/2.7.1.3-mse…

Application deployment traffic Architecture diagram

Figure 6. Illustrates the application deployment architecture

Flow pressure source

In the spring-Cloud-Zuul application, as shown in Figure 6, service calls are made simultaneously to the grayscale version and normal version of Spring-Cloud-A at the rate of QPS 100.

Deploy the Demo application

Save the following in a file, say mse-demo.yaml, Execute kubectl apply -f mse-demo.yaml to deploy the application to the pre-created Kubernetes cluster (note that because demo has CronHPA tasks, Please install the ack-kubernetes- cronhPA-Controller component in the container service -kubernetes- > Marketplace -> Application directory to install the component in the test cluster. Application A, APPLICATION B, and application C deploy A baseline version and A grayscale version respectively. Application B disables the lossless offline function in the baseline version, and the grayscale version enables the lossless offline function. C The service preheating capability is enabled for the application, and the preheating duration is 120 seconds.

# Nacos Server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nacos-server
  name: nacos-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nacos-server
  template:
    metadata:
      labels:
        app: nacos-server
    spec:
      containers:
      - env:
        - name: MODE
          value: standalone
        image: registry.cn-shanghai.aliyuncs.com/yizhan/nacos-server:latest
        imagePullPolicy: Always
        name: nacos-server
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
      dnsPolicy: ClusterFirst
      restartPolicy: Always
# Nacos Server Service 配置
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-server
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: ClusterIP
#入口 zuul 应用
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-zuul
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spring-cloud-zuul
  template:
    metadata:
      annotations:
        msePilotAutoEnable: "on"
        msePilotCreateAppName: spring-cloud-zuul
      labels:
        app: spring-cloud-zuul
    spec:
      containers:
        - env:
            - name: JAVA_HOME
              value: /usr/lib/jvm/java-1.8-openjdk/jre
            - name: LANG
              value: C.UTF-8
          image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-zuul:1.0.1
          imagePullPolicy: Always
          name: spring-cloud-zuul
          ports:
            - containerPort: 20000
# A 应用 base 版本,开启按照机器纬度全链路透传
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-a
  name: spring-cloud-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-a
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30
      
# A 应用 gray 版本,开启按照机器纬度全链路透传
---            
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-a-gray
  name: spring-cloud-a-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-a-gray
  strategy:
  template:
    metadata:
      annotations:
        alicloud.service.tag: gray
        msePilotCreateAppName: spring-cloud -a
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-a-gray
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: profiler.micro.service.tag.trace.enable
          value: "true"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-a:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-a-gray
        ports:
        - containerPort: 20001
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 10
          periodSeconds: 30
            
# B 应用 base 版本，关闭无损下线能力
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-b
  name: spring-cloud-b
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b
  strategy:
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-b
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        - name: micro.service.shutdown.server.enable
          value: "false"
        - name: profiler.micro.service.http.server.enable
          value: "false"
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30
            
# B 应用 gray 版本,默认开启无损下线功能
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-b-gray
  name: spring-cloud-b-gray
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-b-gray
  template:
    metadata:
      annotations:
        alicloud.service.tag: gray
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-b-gray
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-b:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-b-gray
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    wget http://127.0.0.1:54199/offline 2>/tmp/null;sleep
                    30;exit 0
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 10
          periodSeconds: 30
            
# C 应用 base 版本
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: spring-cloud-c
  name: spring-cloud-c
spec:
  replicas: 2
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      annotations:
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: "on"
      labels:
        app: spring-cloud-c
    spec:
      containers:
      - env:
        - name: LANG
          value: C.UTF-8
        - name: JAVA_HOME
          value: /usr/lib/jvm/java-1.8-openjdk/jre
        image: registry.cn-shanghai.aliyuncs.com/yizhan/spring-cloud-c:0.1-SNAPSHOT
        imagePullPolicy: Always
        name: spring-cloud-c
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 10
          periodSeconds: 30
#HPA 配置
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-b
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-b
   jobs:
   - name: "scale-down"
     schedule: "0 0/5 * * * *"
     targetSize: 1
   - name: "scale-up"
     schedule: "10 0/5 * * * *"
     targetSize: 2
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-b-gray
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-b-gray
   jobs:
   - name: "scale-down"
     schedule: "0 0/5 * * * *"
     targetSize: 1
   - name: "scale-up"
     schedule: "10 0/5 * * * *"
     targetSize: 2
---
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: spring-cloud-c
spec:
   scaleTargetRef:
      apiVersion: apps/v1beta2
      kind: Deployment
      name: spring-cloud-c
   jobs:
   - name: "scale-down"
     schedule: "0 2/5 * * * *"
     targetSize: 1 
   - name: "scale-up"
     schedule: "10 2/5 * * * *"
     targetSize: 2
# zuul 网关开启 SLB 暴露展示页面   
---     
apiVersion: v1
kind: Service
metadata:
  name: zuul-slb
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 20000
  selector:
    app: spring-cloud-zuul
  type: ClusterIP
# a 应用暴露 k8s service
---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-base
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a
---
apiVersion: v1
kind: Service
metadata:
  name: spring-cloud-a-gray
spec:
  ports:
    - name: http
      port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a-gray
# Nacos Server SLB Service 配置
---
apiVersion: v1
kind: Service
metadata:
  name: nacos-slb
spec:
  ports:
  - port: 8848
    protocol: TCP
    targetPort: 8848
  selector:
    app: nacos-server
  type: LoadBalancer
Copy the code

Result Verification 1: Lossless offline function

Since we have timed HPA enabled for both spring-Cloud-B and Spring-Cloud-b-Gray applications, the simulation performs timed scaling every 5 minutes.

Log in to MSE console, enter Microservice Governance Center -> Application List -> Spring-Cloud-A -> Application Details. From the application monitoring curve, we can see the traffic data of Spring-Cloud-A application:

For gray traffic, the number of request errors is 0 during pod capacity expansion and reduction, and there is no traffic loss. In the unmarked version, because the lossless offline function was disabled, 20 requests sent from Spring-Cloud-A to Spring-cloud-B were reported wrong during pod capacity expansion and reduction, resulting in request traffic loss.

Result verification two: service preheating function

In spring-Cloud-C application, we started timed HPA simulation to simulate the application going online, and scaled up every 5 minutes. The scale was scaled down to 1 node at 0 seconds in the second minute and 2 nodes at 10 seconds in the second minute.

Enable the service preheating function on spring-cloud-b, the consumer side of the preheating application.

Enable the service preheating function on spring-cloud-c, the service provider of the preheating application. The preheating time is set to 120 seconds.

The traffic on the node increases slowly. You can also see the node warm-up start and end times, as well as related events.

From the picture above you can see open preheating function after the restart of the application of flow rate will increase slowly over time, some applications in need in the process of start pre-built connection pool and caching resources such as slow start scene, open service preheating can effectively protect the application cache resources in order to create application security in the process of start start so as to realize the flow condition of the application online.

Program introduction & practical operation

More design details, please see how to realize the nondestructive and micro service application line theme live [3] video playback: yqh.aliyun.com/live/detail…

A link to the

[1] Create a managed Kubernetes cluster help.aliyun.com/document_de…

[2] Open MSE microservice governance help.aliyun.com/document_de…

[3] condition from top to bottom line theme live yqh.aliyun.com/live/detail… Release the latest information of cloud native technology, collect the most complete content of cloud native technology, hold cloud native activities and live broadcast regularly, and release ali products and user best practices. Explore the cloud native technology with you and share the cloud native content you need.

Pay attention to [Alibaba Cloud native] public account, get more cloud native real-time information!