The author | winter island alibaba technical experts

Hosting Knative right out of the box costs you nothing for these resident instances. Combined with the ability of SLB cloud products to provide a Gateway and the ability to retain specifications based on emergent performance instances, you can dramatically save your IaaS expenses, and every penny you pay is not wasted.

< Pay attention to alibaba Cloud original public account, reply to the report can download the full investigation report >

The annual survey report released by CNCF shows that Serverless technology gained further recognition in 2019. 41% of respondents said they already use Serverless, and another 20% said they plan to adopt Serverless in the next 12-18 months. Of all the open source Serverless projects, Knative is the most popular. As shown in the chart below, Knative has a 34% share, far ahead of OpenFaaS in second place. Knative is the first choice to build its own Serverless platform.

Knative’s popularity has something to do with the ecology of the container. Different from FaaS, Knative does not require users to make very big changes to the application. As long as the user’s application is containerized, it can be deployed in Knative. And Knative provides a more focused application model on top of Kubernetes, so users don’t have to pay attention to the application upgrade, traffic gray scale, all of which is done automatically.

The development history of cloud host

Before cloud computing, an enterprise needs to lease physical servers on an IDC to provide services over the Internet, and then deploy applications on physical servers in the IDC. The performance of physical machines has been increasing at the rate of Moore’s Law over the past decade. As a result, a single application cannot fully utilize the resources of the entire physical machine. So there needs to be a technology to solve the problem of resource utilization. Simply think of deploying more than one application if you don’t have enough. However, mixed deployment of multiple applications on the same physical machine may cause many problems, such as:

Port conflicts
Resource isolation
System dependence and difficulty in operation and maintenance

Virtual machine (VM) technology can be used to create multiple virtual hosts on a physical machine. Each host can deploy only one application. In this way, a physical machine can deploy multiple applications and ensure the independence of applications.

As the enterprise grows, a single enterprise may maintain many applications. Each application requires a number of releases, upgrades, rollbacks, etc., and may need to be deployed in a different region. This creates a lot of o&M challenges, starting with the operating environment of the application. So container technology came along, and container technology, with its lightweight isolation capabilities at the kernel level, not only provided almost the same isolation experience as VM, but also brought a huge innovation: container mirroring. Container image can easily copy the application environment, developers only need to put the application dependencies in the image, when the image is running directly use the built-in dependencies to provide services. This solves runtime environment issues during application distribution, upgrade and rollback, and multi-geographic deployment.

When people start to use container technology on a large scale, they find that the burden of maintaining the running environment of instances is greatly reduced. At this time, the biggest problem is the coordination of multiple instances and multiple applications. So the popularity of container technology soon Kubernetes appeared, and the previous VM, container technology is different, Kubernetes is a natural distributed end-state design, not the ability on a single machine. Kubernetes abstracts a more user-friendly API for IaaS resource allocation. Users do not need to care about the specific allocation details. Kubernetes Controller will automatically complete allocation and failover and load balancing based on end-state life. This lets the application developer not care where the specific instance is running, as long as Kubernetes allocates resources when needed.

In both the early physical machine model and the current Kubernetes model, the application developers themselves don’t really want to manage any underlying resources, the application developers just want to run the application. In the physical machine model, people need to own the physical machine. In the Kubernetes model, people don’t really care which physical machine their business processes run on, and in fact can’t predict in advance. As long as the app runs, it doesn’t really matter where it’s running. Physical machines -> Virtual machines -> Containers -> Kubernetes, the whole process is to simplify the application to use IaaS resources. A clear context can be found in this evolution process. The coupling between IaaS and applications is getting lower and lower. The basic platform only needs to allocate IaaS resources to applications when they need to run.

Knative Serving

Before introducing Knative, let’s take a look at the differences between Kubernetes traffic access, application release and Knative traffic access, application release through a Web application. As shown in the figure below, Kubernetes mode is on the left and Knative mode is on the right.

In Kubernetes mode: 1. Users need to manage the Ingress Controller by themselves; 2. The relationship between Ingress and Service needs to be maintained to expose the Service to the outside world; 3. Multiple Deployment rotations are required to complete the upgrade if grayscale observation is required at release time;
In Knative mode: Users only need to maintain one Knative Service resource.

Of course, Knative cannot completely replace Kubernetes. Knative is based on Kubernetes’ ability. In addition to the different resources that users need to directly manage, Kubernetes and Knative have a huge difference in concept:

The role of Kubernetes is to decouple IaaS from applications and reduce the cost of allocating IaaS resources. Kubernetes focuses on the orchestration of IaaS resources. However, Knative is more inclined to the application layer, with elasticity as the core of application choreography.

Knative is a Serverless orchestration engine based on Kubernetes. Its goal is to develop cloud native and cross-platform Serverless orchestration standards. Knative implements this Serverless standard by integrating container construction, workload management (dynamic scaling), and event models. Serving is the core module that runs the Serverless workload.

Application hosting
- Kubernetes is an abstraction for IaaS management. Deploying applications using Kubernetes requires a lot of resources to be maintained
- A single resource can define the hosting of an application with Knative Service
Traffic management
- Knative applies the flow through the Gateway, and then can segment the flow by percentage, which lays a good foundation for elasticity, gray scale and other basic capabilities
Gray released
- Support multi-version management, the application at the same time multiple versions of the online service is easy to achieve
- Different versions can set different percentage of traffic, grayscale publishing and other features are easy to implement
The elastic
- Knative’s core ability to help apps save costs is elasticity, which automatically expands when traffic increases and shrinks when traffic decreases
- Each grayscale version has its own elastic policy, and the elastic policy is associated with the traffic allocated to the current version. Knative makes capacity expansion or reduction decisions based on the amount of traffic allocated

Learn more about Knative here or here.

Why ASK

The Kubernetes community requires you to purchase a host in advance and register as a Kubernetes node to schedule Pod. Purchasing a host in advance actually does not conform to the application logic. Application developers just want to allocate IaaS resources when they need to run application instances. You don’t want to maintain complex IaaS resources. Therefore, if there is a Kubernetes that is fully compatible with the community’s Kubernetes API, but does not need to operate and manage complex IaaS resources themselves, and can automatically allocate resources when needed, this is more in line with the application’s concept of resource use. ASK is based on this concept to bring you the experience of Serverless Kubernetes.

ASK, Serverless Kubernetes, is a Serverless Kubernetes cluster. Users can directly deploy container applications without purchasing nodes, without node maintenance and capacity planning for the cluster, and pay on demand according to the amount of CPU and memory resources configured for the application. The ASK cluster provides complete Kubernetes compatibility while greatly lowering the threshold for Kubernetes usage, allowing users to focus more on the application rather than managing the underlying infrastructure.

This means that you can create a Kubernetes cluster without having to prepare ECS resources in advance, and then deploy your own services. See here for more details on ASK.

The main context of Serverless development summarized in the previous analysis of Serverless process is that the coupling between IaaS and applications is getting lower and lower. The basic platform only needs to allocate corresponding IaaS resources to applications when they need to run. The application manager is only a consumer of IaaS and does not need to be aware of the details of IaaS allocation. ASK is the platform for allocating IaaS resources at any time. Knative is responsible for sensing the real-time status of the application and automatically “requesting” IaaS resources (PODS) from ASK when needed. The combination of Knative and ASK can bring you a more extreme Serverless experience.

For a more in-depth introduction to ASK, see Serverless Kubernetes – Ideals, Realities, and Futures.

Bright spot is introduced

SLB based Gateway

By default, the Knative community supports multiple Gateway implementations such as Istio, Gloo, Contour, Kourier, and Ambassador. Istio is certainly the most popular of these implementations because Istio can be used as a ServiceMesh service in addition to acting as a Gateway. These gateways are fully functional but as a Serverless service Gateway defeats the purpose a bit. You need to have a Gateway instance running permanently, and at least two instances of each other are backed up for high availability. Secondly, the management and control end of these gateways also needs to operate permanently. IaaS fees and o&M of these permanent instances are the costs that businesses need to pay.

In order to provide users with the ultimate Serverless experience, we have implemented Knative Gateway through Ali Cloud SLB, which has all the required functions and is cloud product-level support. Eliminating the need for resident resources not only saves your IaaS costs but also a lot of operational overhead.

Low cost reserved instances

Reserved instances are a feature unique to ASK Knative. Community Knative can be scaled down to zero by default when there is no traffic, but the problem of cold startup from zero to one after scaling down to zero is difficult to solve. In addition to IaaS resource allocation, Kubernetes scheduling, and mirror pull, cold startup also involves the startup time of the application. Application startup times range from milliseconds to minutes, which is almost impossible to control at the general platform level. Of course these issues are present in all Serverless products. Traditional FaaS products tend to run different functions by maintaining a common IaaS pool. In order to protect the pool from being full and the cold start time is very low, FaaS products have various restrictions on user functions. Such as:

Timeout for processing requests: Failure is considered if the request does not wake up after this time;
Surge concurrency: By default, all functions have a limit on concurrency. If the number of requests exceeds this limit, the flow will be limited.
CPU and memory: Cannot exceed the upper limits of CPU and memory.

ASK Knative’s solution to this problem is to balance the cost and cold start problem with a low-priced retention instance. Ali cloud ECI has a lot of specifications, different specifications of computing power is not the same price is not the same. The price comparison for a compute instance and a burst performance instance in a 2c4G configuration is shown below.

Through the comparison above, it can be seen that burst performance instances are 46% cheaper than computational ones. Therefore, using burst performance instances to provide services when there is no traffic can not only solve the problem of cold start, but also save a lot of costs.

In addition to the price advantage, one of the most eye-catching features of burst performance examples is CPU credits. Burst performance instances can use CPU credits to address burst performance requirements. Burst performance instances can continuously earn CPU credits. When performance cannot meet load requirements, the accumulated CPU credits can be consumed to seamlessly improve computing performance without affecting the environment and applications deployed on the instance. With CPU credits, you can allocate computing resources from an overall business perspective, seamlessly transferring the remaining computing power during off-peak periods to peak periods (simply understood as hybrid oil and electricity ☺️☺️). See here for more details on burst performance instances.

So ASK Knative’s strategy is to replace standard compute instances with burst performance instances during business troughs, and then seamlessly switch to standard compute instances when the first request comes in. This helps you reduce the cost of traffic downturns, and CPU points earned during downturns can be spent during business peaks, with every penny you pay not wasted.

Using burst performance instances as reserved instances is only the default policy, and you can specify other types of instances as your desired specifications for reserved instances. Of course, you can also specify the minimum reserved standard instance, thus turning off the reserved instance function.

The Demo show

Serverless Kubernetes(ASK) After a cluster is created, you can apply for enabling the Knative function through the following nail groups. You can then use the capabilities provided by Knative directly in the ASK cluster.

When Knative is turned on, a service called ingress-gateway is created in Knative serving Namespace. This is a loadBalance service. An SLB is automatically created through CCM. As shown below, 47.102.220.35 is the public IP address of SLB. Then, you can access the Knative service through this public IP address.

# kubectl -n knative-serving get svc
NAME              TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)        AGE
ingress-gateway   LoadBalancer   172.19.8.35   47.102.220.35   80:30695/TCP   26h
Copy the code

Our next series of operations will start with a coffee shop. Suppose this coffee shop has two categories: coffee and tea. Let’s deploy the coffee service first, and then the Tea service. It will also include demonstrations of features such as version upgrades, traffic grayscale, custom Ingress, and automatic elasticity.

Deploying the Coffee Service

Save the following in a coffee.yaml file and deploy to the cluster via Kubectl:

# cat coffee.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: coffee
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: "10"
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8
          env:
            - name: TARGET
              value: "coffee"
Copy the code

Execute kubectl apply-f coffee.yaml to deploy the coffee service. After a while, you should see that the coffee service is already deployed.

# kubectl get ksvc
NAME     URL                                 LATESTCREATED   LATESTREADY    READY   REASON
coffee   http://coffee.default.example.com   coffee-fh85b    coffee-fh85b   True
Copy the code

The above command execution result of coffee.default.example.com is Knative only generated for each KSVC subdomain. Using the curl command, you can use the SLB public IP address to access the service. That’s what the coffee service returns.

# curl -H "Host: coffee.default.example.com" http://47.102.220.35
Hello coffee!
Copy the code

Automatic elastic

Autoscaler is a first class citizen of Knative, which is the core of Knative’s ability to help users save money. Knative’s default KPA elasticity strategy automatically adjusts the number of pods based on real-time traffic requests. Now let’s get a feel for the elasticity of Knative. Let’s take a look at the current POD information:

# kubectl get pod
NAME                                       READY   STATUS    RESTARTS   AGE
coffee-bwl9z-deployment-765d98766b-nvwmw   2/2     Running   0          42s
Copy the code

As you can see, we now have one POD running, and we are ready for the pressure test. Before starting pressure measuring application configuration, let’s review the first coffee of the yaml front have a such a configuration, as shown below, the autoscaling. Knative. Dev/target: “10” means that each Pod has a maximum concurrency of 10. If there are more than 10 concurrent requests, new pods should be added to accept requests. See here for a more detailed description of the Knative Autoscaler.

# cat coffee-v2.yaml
. .
      name: coffee-v2
      annotations:
        autoscaling.knative.dev/target: "10"
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8
. .
Copy the code

All right, let’s start the pressure test now. Use the hey command to initiate a pressure test.

macOS
Linux
Windows

hey -z 30s -c 90 --host "coffee.default.example.com" "Http://47.100.6.39/? sleep=100"
Copy the code

What this command means:

-z 30s indicates continuous pressure measurement for 30 seconds.
-c 90 Indicates that 90 concurrent requests are used to initiate pressure test.
– the host “coffee.default.example.com” said binding host;
“Http://47.100.6.39/? Sleep =100” is the request URL, where sleep=100 represents sleep 100 milliseconds in the test image, simulating the real online service.

Run the preceding command to test the pressure, and then observe the change of the NUMBER of PODS. Watch -n 1 ‘kubectl get pod’ As shown below, the second half is the change of Pod number, and the right half is the process of pressure test command execution. You can observe the pod change during the pressure test in this GIF. When the pressure is on, Knative automatically expands, so the number of pods increases. When Knative detects that the flow becomes less after the pressure measurement, it will automatically shrink the capacity, which is a completely automatic process of capacity expansion and reduction.

Keep instance

In the previous highlights section, ASK Knative uses reserved instances to solve cold start and cost problems. Let’s take a look at the process of switching between reserved and standard instances.

Kubectl get POD (kubectl get Pod); kubectl get pod (kubectl get Pod); At this point, the service is already being provided using the reserved instance. When there is no online request for a long time, Knative will automatically expand the reserved instances and reduce the standard instances to zero, thus achieving the purpose of cost saving.

# kubectl get pod
NAME                                               READY   STATUS    RESTARTS   AGE
coffee-bwl9z-deployment-reserve-85fd89b567-vpwqc   2/2     Running   0          5m24s
Copy the code

What happens if some traffic comes in? So let’s test that. As can be seen from the GIF below, the standard instance will be automatically expanded if traffic comes in, and the reserved instance will be reduced when the standard instance is ready.

By default, the reserved instance uses ecs.t5-LC1m2.small (1C2G). Of course, some applications start with memory (such as the JVM) allocated by default. If an application requires 4 gigabytes of memory, ecs.t5-c1m2.large(2c4G) may be used as the size of the reserved instance. So we also provide a way for the user to specify the size of the reserved instance, On their Knative when the Service can be specified by means of the Annotation instances of specifications, such as knative.aliyun.com/reserve-instance-eci-use-specs: Ecs.t5-lc2m1. nano This configuration means that ecS.t5-LC2m1. nano is used as the reserved instance specification. Save the following to coffee-set-reserve.yaml:

# cat coffee-set-reserve.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: coffee
spec:
  template:
    metadata:
      annotations:
        knative.aliyun.com/reserve-instance-eci-use-specs: "ecs.t5-c1m2.large"
        autoscaling.knative.dev/target: "10"
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8
          env:
            - name: TARGET
              value: "coffee-set-reserve"
Copy the code

Execute kubectl apply -f coffee-set-reserve.yaml to commit to the Kubernetes cluster. Wait a little while and check the list of pods after the new version has been scaled down to a reserved instance:

# kubectl get pod
NAME                                               READY   STATUS    RESTARTS   AGE
coffee-vvfd8-deployment-reserve-845f79b494-lkmh9   2/2     Running   0          2m37s
Copy the code

Looking at the reserved instance specification of set-Reserve, you can see the ECS.t5-C1m2. large specification that has been set to 2C4G:

# kubectl get pod coffee-vvfd8-deployment-reserve-845f79b494-lkmh9 -oyaml |head -20
apiVersion: v1
kind: Pod
metadata:
  annotations:
    . .
    k8s.aliyun.com/eci-instance-cpu: "2.000"
    k8s.aliyun.com/eci-instance-mem: "4.00"
    k8s.aliyun.com/eci-instance-spec: ecs.t5-c1m2.large
    . .
Copy the code

Upgrade coffee service

Before upgrading, let’s take a look at the current Pod example:

# kubectl get pod
NAME                                               READY   STATUS    RESTARTS   AGE
coffee-fh85b-deployment-8589564f7b-4lsnf           1/2     Running   0          26s
Copy the code

Now let’s upgrade our coffee service. Save the following to coffee-v1.yaml:

# cat coffee-v1.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: coffee
spec:
  template:
    metadata:
      name: coffee-v1
      annotations:
        autoscaling.knative.dev/target: "10"
    spec:
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8
          env:
            - name: TARGET
              value: "coffee-v1"
Copy the code

There are two additions to the current deployed version:

Set the name coffee-v1 for the revision currently deployed (it is automatically generated if not set).
V1 is set in the environment variable, so that the current service version is V1 based on the content returned by HTTPkubectl apply -f coffee-v1.yamlCommand to deploy v1. Continue to use after deploymentCurl -h Host: "coffee.default.example.com" [http://47.102.220.35] (http://47.102.220.35)Verify.

After a few seconds, you’ll notice that Hello coffee-v1 is returned! In this process, the service is not interrupted, and there is no need to do any manual switchover. After modification, the new version and the old version instance can be switched automatically by submitting directly.

# curl -H "Host: coffee.default.example.com" http://47.102.220.35
Hello coffee-v1!
Copy the code

Now let’s look at the pod instance status. We can see that the POD instance has been switched. The old Pod was automatically replaced with the new one.

# kubectl get pod
NAME                                    READY   STATUS    RESTARTS   AGE
coffee-v1-deployment-5c5b59b484-z48gm   2/2     Running   0          54s
Copy the code

For more complex demonstrations, please go here.

conclusion

Knative is the most popular Serverless orchestration framework for Kubernetes Ecology. Community-native Knative requires a resident Controller and a resident gateway to provide services. In addition to paying IaaS costs, these resident instances also bring a lot of o&M burden, which makes Serverless application difficult. So we fully hosted Knative Serving in ASK. Out of the box, these resident instances cost you nothing. In addition to the ability to provide Gateway through SLB cloud products, we also offer retention specifications based on burst performance instances, which allows your service to significantly reduce IaaS expenses during traffic travails, and CPU points accumulated during traffic travails can be consumed during peak traffic. Every penny you pay is not wasted.

For more information on Knative, see here or here.

The resources

Knative official documentation
Knative samples
Knative Autoscaler configuration details
Knative on ASK demo
ECS burst performance instance, switching performance mode document
Aliyun Serverless Kubernetes (ASK) Cluster Introduction
Overview of cloud server ECS burst performance examples
Serverless Kubernetes – Ideals, Reality and the Future

extras

The bottom layer of Knative on ASK uses ECI to carry, and currently you can get 100 vouchers for free for trial.

Trial address: www.aliyun.com/daily-act/e…

Course recommended

In order for more developers to enjoy the dividends brought by Serverless, this time, we gathered 10+ Technical experts in the field of Serverless from Alibaba to create the most suitable Serverless open course for developers to learn and use immediately. Easily embrace the new paradigm of cloud computing – Serverless.

Click to free courses: developer.aliyun.com/learning/ro…

“Alibaba Cloud originator focuses on micro-service, Serverless, container, Service Mesh and other technical fields, focuses on the trend of cloud native popular technology, large-scale implementation of cloud native practice, and becomes the public account that most understands cloud native developers.”

Out of the box, Knative gives you the ultimate container Serverless experience