By yugasun from yugasun.com/post/server… This article can be reproduced in full, but the original author and source need to be retained.

Before reading this article, you should be familiar with the concept of Serverless. If you don’t already know what Serverless is, you may want to read the previous articles in this series. The author in the previous several articles, focus on how to Serverless traditional services and Serverless actual experience, such as how to achieve a background management system. However, in the actual development process, we not only need to consider how to migrate our services to Serverless architecture, but also need to understand the difference between Serverless architecture and traditional architecture, so that we can develop more efficient and stable services in the actual work. This chapter is the author summed up in 2 years of Serverless research and development of several important knowledge and experience, hope to be helpful to readers.

This series of articles is divided into two parts: theory and practice. The theory part will introduce some concepts related to Serverless, and then summarize some development suggestions in actual project development based on the theory part.

Main contents of this paper:

  1. How does Serverless expand and shrink automatically
  2. Cold start principle and optimization method
  3. How do I plan service function granularity
  4. How do I choose the right type of function

How does Serverless expand and shrink automatically

This paper takes Knative as an example to introduce how Serverless achieves automatic capacity expansion and contraction.

Knative is Google’s open source Serverless architecture solution, built on Kubernetes, designed to provide a consistent standard pattern for building and deploying Serverless architecture and event-driven applications

Knative itself implements a KPA (Knative Pod Autoscaler) algorithm for automatic capacity expansion and contraction. The basic idea of the KPA algorithm is that the system automatically expands and shrinks the capacity according to the current monitored traffic (the number of concurrent requests). The simple calculation formula is:

Expected number of instances = current number of concurrent requests/number of concurrent requests supported by the instanceCopy the code

For example, 🌰, the system detects 100 concurrent requests. If a single instance supports 10 concurrent requests, the expected number of instances is 100/10 = 10. If the number of instances currently available on the system is 1, the system automatically adds 9 instances until the desired number of instances is 10. Before the 9 instances are ready, Knative will cache the request to Activator until the expansion is completed (9 instances are ready) and then forward it to FaaS (function code execution processing request). The process of waiting for the instance to be ready is often called cold start. Next, we will talk about cold start.

Note: in fact, the cloud vendors in order to reduce the cold start, according to the optimal allocation rules and algorithm will be more number of instances, such as the expansion and the actual number of instances the instance number = 1.2 * expectations, so the next test cycle comes, even if concurrent requests for 120, the current number of active instances is enough, this will be talked about later.

Cold start principle and optimization method

As we all know, Serverless platform provides developers with diversified and convenient code running environment. During the development process, developers can not care about the preparation of code running environment, because when FaaS is running, Serverless platform has helped us to prepare. However, the Serverless container instance startup is not instantaneous. It takes time to prepare. The instance startup process can be roughly divided into four stages:

Download the code -> Start the instance -> initialize the code (load code and dependencies) -> Execute the functionCopy the code

Note: The first step in mirroring functions is to synchronize mirroring.

The cold start process involves all four of the above steps. When the container is ready, the function is executed again without expansion. The function is directly executed without the previous three steps, which is called hot start.

Cold startup usually takes hundreds of milliseconds, while hot startup usually takes milliseconds because the first three steps are omitted. So we should minimize cold starts as much as possible. The method of optimizing the cold start of the function mainly has two dimensions: the time of cold start and the probability of cold start. The following two dimensions will be explained respectively.

Probability of cold start

There are two main methods to reduce the probability of cold start: instance reuse and instance warming.

1. Instance reuse

Instances of function code that finishes execution are not recycled immediately, but go from Active to Reserve, waiting for the next request. Instances on standby are available for a configurable duration of 30 minutes, and if no requests need to be processed during that time, the instance is reclaimed. If a request needs to be processed, the instance will come back to the active state and process the request for instance reuse purposes.

However, if the standby time is set too long, standby instances will not be recovered for a long time, resulting in a waste of resources, which will have a great impact on the cost of cloud vendors. So each cloud vendor will try to set an optimal value according to their product situation and optimization strategy.

2. Instance preheating

Example preheating can be divided into two types: passive preheating and active preheating. The so-called passive or active, usually refers to whether the user is active behavior.

Passive preheating means that during capacity expansion, the cloud vendor will expand the number of instances larger than the current expectation based on the actual situation. The specific number ratio is determined by the cloud vendor.

Active preheating means that the developer relies on the capacity of the Serverless platform to provide reserved instances and proactively configures the number of reserved instances, which will not be released. In this way, when a request arrives, the reserved instance can directly provide services, effectively avoiding the probability of cold start.

As mentioned above, if instances are not recovered all the time, the cost of Serverless will be greatly increased. Therefore, cloud manufacturers charge certain fees for reserved instances, so developers need to evaluate whether and how many instances they need to reserve according to their own service characteristics.

Regular preheating 🔥

The above methods to optimize the probability of cold start are dependent on the capabilities provided by the Serverless platform. Even the active preheating should be configured only after the platform supports the function of reserving instances. In fact, we can also use the active call function to achieve the instance of preheating, using the instance idle after a period of time will be recycled, you can trigger the function execution method, to achieve the instance of preheating.

For example, when we create the Serverless function (FaaS), we can also configure it with a timing trigger (usually FaaS can be configured with many types of triggers, timing trigger is one of them). This trigger can trigger FaaS execution every 5 minutes (configurable). In this way, regular preheating can be realized. Depending on the FaaS service, multiple timing triggers can be set up to handle different concurrency scenarios.

However, FaaS is charged according to the memory and time used when it is running. It will cost to set the timing to trigger the running, so we cannot use it carelessly. Of course, you also need to do special things in the FaaS code by determining that the event type is a timed trigger type, such as returning the result as soon as possible, so that the function can run in a shorter time and thus reduce the cost.

Note: Cloud vendors generally limit the number of triggers a function can configure.

Cold start time

For the time of cold start, the first three steps are optimized: download code, start instance, initialize code. In addition to the fact that the startup time depends on the underlying optimization of the platform, the download code and initialization code are under our control as developers, after all, we wrote the code ourselves.

1. Optimize code size

On the same network, downloading code takes time, depending on the size of the FaaS code, and the larger the code, the longer it takes to download.

Many developers at the beginning of Serverless development will upload the whole code and deploy it to FaaS. As a result, the code deployed to FaaS contains a lot of unnecessary code and dependencies. As a result, THE FaaS code is much larger than the actual business code, so the download process will be longer. The natural cold start time will also be longer.

Node_modules dependencies are installed in the node_modules directory. These dependencies include production environment dependencies and devDependencies. Both of these dependencies are required for local development (such as the Jest unit test module), but in a production environment, development environment dependencies are not required.

In order to save trouble, many developers directly execute the NPM install command to install these two dependencies at the same time during deployment. Most of the time, the volume of the project development environment dependencies is larger than the volume of the production environment dependencies, which directly leads to the FaaS code is too large. In fact, before deploying to FaaS, you can specify that only production environment dependencies should be installed by executing the NPM install –production command, which can greatly reduce the volume of FaaS code after deployment.

In addition, some projects need to be compiled, such as TypeScript projects. We can also use packaging tools (Webpack, Rollup) to package and compress the project code, and even remove unnecessary code using the Tree shaking technology of packaging tools. Can effectively reduce FaaS code volume.

2. Reduce unnecessary dependencies

Here’s a test of the effect of FaaS code dependent modules on Cold startup time (image from the article Serverless: Cold Start War) :

As can be seen from the figure above, the more dependencies, the longer the cold start time. This is because the more dependency packages, the larger the code volume, the longer it takes to download the code step, and the longer it takes to initialize the code step.

So it becomes necessary to reduce unnecessary dependencies. In fact, when I talked about optimizing code size, I mentioned how to reduce project dependencies. In addition to the methods mentioned above, developers can write business code to minimize the introduction of unnecessary module dependencies.

For example, in front-end development, node_modules, the project’s dependency directory, is like a black hole, invisible, often hundreds of megabytes, or even the size of G, because NPM initially installed modules in a tree structure, Each third-party module installs its dependencies into its own node_modules directory, which results in multiple installations to its own node_modules directory if different third-party modules rely on the same module. Later, NPM optimized the tree dependency structure and changed it into a flat structure. However, when the version of the same third-party module is different, there will still be repeated installation. In addition, the quality of third-party modules in the open source community is not uniform, sometimes a simple type determination needs to be achieved by installing third-party modules, and many times, we only use a small part of the functionality of a module.

When developing a project, we should think more about whether we can develop and implement a simple function by ourselves, rather than always thinking of using open source third-party modules to solve problems. Implementing it yourself as a programmer takes more time, but improves yourself faster, and reduces the introduction of unnecessary module dependencies.

How do I plan service function granularity

When I was developing some Serverless back-end services, I often faced a function granularity problem: use one function or multiple functions?

Currently, Serverless platform provides good support for traditional Web services, and also provides very convenient migration methods, such as Serverless Components solution of Tencent Cloud, SAM of AWS and Serverless Devs of Ali Cloud. Both can be quickly and easily deployed to FaaS using YAML configurations for Web framework-based services.

But they both deploy the entire Web service into the same function, which can cause the following problems:

  1. What should I do if there is an interface in the Web service that requires long computation or large memory?
  2. What if the Web service code size exceeds the cloud vendor limit (typically 500 MB)?

Based on the principle of micro service break up after break up, also need to consider the service need long time in the operation or large memory interface, fee standards for FaaS is each time and use of the memory size, if not split, the service function of memory configuration according to the large memory is needed for configuration, this will directly lead to higher overall costs. Therefore, for this special service interface, we should try to separate into separate functions for processing.

Web services that exceed the code size limits of cloud vendor FaaS (provided that code size and dependencies have been optimized) must be unbundled if they want to be deployed on FaaS.

So the solution to the above two problems is to split the service, as for how to split the service is always a hot topic in software development, this article will not discuss in depth, service split can refer to the micro service split.

How do I choose the right type of function

Before choosing a function type, let’s take a look at the several function types supported by the Serverless platform so far.

The event type

The original FaaS ran as event types, firing events through triggers and then handling them through FaaS functions. When traditional Web services need to migrate to event functions, they need to convert API gateway trigger events in the function entry file for the article how to Migrate Web Frameworks to Serverless. The flow chart is as follows:

One drawback of this architecture is that it requires adaptation layer transformation for each request, which increases request latency. Because API gateways and functions exchange information via JSON structures, native file type transfer is not supported. To transfer file types, you need to encode the file types (such as Base64) on the API gateway side and decode them on the function side.

Tencent cloud official event function.

HTTP type

In order to make up for the shortcomings of event type functions, cloud manufacturers specially launch Web type functions for Web (Tencent Cloud is called Web type, Ali Cloud is called HTTP type), which are directly connected with FaaS HTTP through API gateway. When developers migrate Web services to FaaS, There is no need to add adaptation layer code to transform JSON, thus shortening the actual request link and providing better Web service performance compared to the event function. Moreover, developers no longer need to modify the entry file and just need to start the service through listening port according to the traditional Web application development method. Refer to Tencent Cloud official Web functions.

For a more detailed comparison of parameters, please refer to the article “Revisiting Web Function – Using Data to Explain Advantages”.

Mirror function

There is a problem with code-deployed functions that rely on an execution environment pre-provided by the cloud vendor: For some special business needs, especially some functions related to video processing, the system needs to install additional low-level libraries to support (although the preinstalled environment already supports most low-level dependencies, but still cannot meet), but developers cannot customize the installation of these dependencies to the runtime environment provided by the cloud vendor.

In response to the need for a custom runtime environment, cloud vendors provide FaaS deployed based on user-defined images (only the deployment mode is changed here, but the FaaS type is still only the event type and HTTP type). With the deployment of custom images, applications that rely on the underlying library can be easily deployed to FaaS.

While mirroring deployment is convenient, it has a drawback: in most cases, cold starts take a long time. In most cases, since the image is larger than the code package in most cases, the cold boot time will be much slower to pull the image than the code, resulting in a longer cold boot time. Of course, image size can also be optimized, such as the basic image of Node.js application can be optimized to tens of megs, but if it is a lightweight Node.js application, and no specific dependencies are required (such as Puppeteer, TensorFlow, FFmpeg…) , there is no need to use mirroring mode deployment, using code mode deployment is much more convenient.

Choose the type of function that suits you

The characteristics of the two FaaS types can be seen literally. The event type is more suitable for event-driven business scenarios such as video stream processing, Kafka message subscription processing, and so on, while the HTTP type is more focused on Web application scenarios. As for code deployment mode, the author is mainly recommended ways to deploy code, unless it is forced to customized running environment, will only be deployed with a mirror, the mirror images, after all, still rely on mirror service, image and enterprise edition container service (cloud vendors do not provide personal version of the SLA guarantee) is very expensive, and mirror the deployment of cold start time is more long.

Developers have the flexibility to choose their own FaaS types and deployment methods based on their individual needs. The above analysis is for reference only.

reference

  1. In-depth analysis of Knative basic functions: Knative Serving Autoscaler
  2. I’ve been talking about cold start, what is it?
  3. Serverless: Cold Start War