The author | li source (yuan yi) | Serverless public number

A, Knative

Knative provides automatic capacity expansion based on traffic. It can automatically expand the number of instances during peak hours according to the application requests. When the number of requests is reduced, instances are automatically scaled down to automatically save on resource costs. In addition, Knative provides traffic-based grayscale publishing capabilities that grayscale percentages of traffic.

Before introducing Knative grayscale publishing and automatic elasticity, let’s take a look at the traffic request mechanism in ASK Knative.

As shown in the figure above, the overall traffic request mechanism is divided into the following parts:

  • On the left is the version information of Knative Service, you can set the percentage of traffic. The following is the routing policy. In the routing policy, the Ingress Controller sets the corresponding routing rules to ali Cloud SLB.

  • Revision on the right is the version of the service that was created. In Revision, when the traffic comes in through SLB for the resource that should have Deployment, it is directly forwarded to the backend server Pod according to the corresponding forwarding rules.

In addition to the traffic request mechanism, the figure above also shows the corresponding elastic policies, such as KPA, HPA, etc.

Service life cycle

A Service is a resource object that is operated directly by the developer and consists of two parts: Route and Configuration.

As shown in the preceding figure, you can configure the image, content, and environment variable information in Configuration.

1. Configuration

The Configuration is:

  • Manage the expected state of the container;
  • Similar to a version controller, each update to the Configuration creates a new Revision.

As shown in the figure above, compared with Knative Service, Configuration is very similar to its Configuration, in which the resource information expected by the container is configured.

2. Route

The Route can:

  • Control traffic distribution to different revisions;
  • Supports traffic distribution by percentage.

As shown in the figure above, a Route resource contains traffic information. In traffic information, you can set the corresponding version and the corresponding traffic ratio of each version.

3. Revision

  • A snapshot of Configuration;
  • Version tracing, rollback.

Knative Service version management resources: Revision, which is a snapshot of the Configuration. Each update to the Configuration creates a new Revision. Revisions can be used to track version, publish gray levels, and roll back. In Revision resources, you can see the configured image information directly.

Three, grayscale release based on flow

As you can see in the figure above, if V1 Revision was created initially, and a new version was changed, we would update the Configuration in the Service and create V2 accordingly. Then set different traffic ratios for V1 and V2 through Route. In the figure above, V1 is 70% and V2 is 30%. The traffic will be distributed to the two versions in a ratio of 7:3. Once the V2 version is verified without problems, the gray scale can be continued by adjusting the traffic ratio until the new version V2 reaches 100%.

In the gray scale process, once the new version is found to be abnormal, you can adjust the traffic ratio at any time to roll back. Assuming that there is a problem with V2 version when the gray scale reaches 30%, we can adjust the proportion back and set the flow rate to 100% on the original V1 version to achieve the rollback operation.

In addition, we can also Tag the Revision with a traffic Tag in Route. After the Tag is completed, Knative will automatically generate a directly accessible URL for the current Revision. Through this URL we can directly hit the corresponding traffic to the current version, so that you can achieve the debugging of a version.

Four, automatic elasticity

Abundant elastic strategies are provided in Knative. In addition, Some corresponding elastic mechanisms are extended in ASK Knative. The following elastic strategies are introduced respectively:

  • Knative Pod automatic expansion capacity (KPA);
  • Pod horizontal automatic expansion and shrinkage (HPA);
  • Supports automatic capacity expansion and reduction by timing + HPA.
  • Event gateway (precise elasticity based on traffic request);
  • Extend custom expansion and shrinkage plug-ins.

1. Automatic expansion and contraction capacity -KPA

Figure: Knative Pod automatic expansion and contraction capacity (KPA)

As shown in the figure above, Route can be understood as a traffic gateway. Activator carries the responsibility of 0~1 in Knative. When there is no traffic request, Knative will attach the corresponding service to Activator Pod. Once the first traffic comes in, it will enter the Activator first. After receiving the traffic, the Activator will expand Pod through Autoscaler. After expansion, the Activator will forward the request to the corresponding Pod. Once the Pod is ready, the service will Route directly to the Pod, and the Activator has finished its job.

In the process of 1~N, Pod can collect the request concurrency index in each Pod through kube-proxy container, that is, the request index. Autoscaler aggregates these request indicators and calculates the required capacity expansion to achieve the final capacity expansion based on traffic.

2. Horizontal expansion and contraction capacity -HPA

Figure: Pod horizontal automatic expansion and shrinkage (HPA)

It is actually the ORIGINAL HPA in K8s encapsulation, through Revision configuration corresponding indicators and strategies, the use of K8s native HPA, support automatic expansion of CPU, Memory capacity.

3. Timing +HPA fusion

  • Planning capacity in advance for resource preheating;
  • Combined with CPU and Memory.

On Knative, we will periodically integrate with HPA to realize capacity planning in advance for resource preheating. When using K8s, we can realize that expansion through HPA can not meet the actual emergency scenarios if expansion is carried out after the indicator threshold is raised. For regular elastic tasks, you can plan the capacity to be expanded in a specified period of time in advance.

We also integrate with CPU and Memory. For example, a certain period of time is set to 10 PODS, but the current CPU threshold needs 20 pods. In this case, the maximum value of the two is 20 pods for expansion, which is the most basic guarantee of service stability.

4. Event gateway

  • Automatic elasticity based on number of requests;
  • 1 to 1 task distribution.

The event gateway is based on the precise elasticity of traffic requests. When the event comes in, it will first enter the event gateway. We will expand THE Pod capacity according to the number of requests currently coming in. After the expansion is completed, there will be a demand to forward tasks and PODS one by one. Because sometimes a Pod can only handle one request at a time, this is the scenario that the event gateway addresses.

5. Customize the capacity expansion plug-in

There are two key points for customizing the scaling plug-in:

  • Acquisition index;
  • Adjust the Pod instance number.

Where do indicators come from? Like the flow-based KPA provided by the Knative community, metrics are pulled from each Pod’s Queue-proxy container through a timed task. The controller processes the acquisition of these indicators, makes aggregation and calculates how many PODS need to be expanded. How to perform scaling? Actually, adjust the number of PODS in the corresponding Deployment.

After adjusting the collection index and the number of Pod instances, it is easy to customize the expansion and contraction plug-in.

Five, practical operation demonstration

The following is an example demonstration, which mainly includes:

  • Grayscale publishing based on flow
  • Automatic capacity expansion based on flow

Demonstration to watch links: developer.aliyun.com/live/246127

Author introduction: Peng Li, Name: Yi Yuan, senior development engineer of Ali Cloud Container Platform, joined Ali in 2016, deeply involved in alibaba’s comprehensive containerization and supporting double Eleven containerization link for many years. It focuses on cloud native areas such as containers, Kubernetes, Service Mesh and Serverless, and is committed to building a new generation of Serverless platform. Currently, I am responsible for the work related to Ali Cloud container service Knative.