Construction and thinking of Serverless deployment platform

Original post with my bloglambdas.devWelcome to exchange.

The content of this article is not limited to front-end publishing, but also applies to other language frameworks.

We all know that deploying services to Serverless in suitable scenarios has great advantages over traditional services, such as cost reduction, flexible scaling, high scalability, high stability and so on. Instead of going into details, we will talk about how to deploy services to Serverless reasonably in business. Serverless in this article refers to the FaaS element suitable for deploying services, and most cloud platforms provide FaaS support, such as AWS Lambda/Azure Functions.

Due to the characteristics of Serverless, we need to build modules on a fixed programming model, which is a huge hindrance to both the transplantation and the construction of new projects. At the same time, the integration with the original service, the difficulty of deployment and maintenance, standardization and other problems are worth considering. At present, there are many rapid deployment services provided by Serverless based on cloud platform launched in the market, and they have made a lot of achievements in smoothing differences and improving experience. Unfortunately, there is no pure open source system based on this. Here, I learn a lot of experience of Serverless deployment platform design. This paper puts forward a kind of design idea of engineering deployment platform which can be used in generation based on no service, and we can design and build modern deployment platform on this basis.

Until then, you can read the basics of the Serverless architecture to understand how it differs from traditional operations architecture. In addition, the model I propose below has been put into practice in production and the design details have been optimized, but I will try to minimize the technical details and talk more about general methods.

advantage

Before we begin, let’s discuss the pros and cons of building a dedicated Serverless deployment platform, some of which are due to the nature of Serverless itself, which proves that migrating to Serverless can be a huge success in the right scenario, and some of which are platform design advantages:

High-speed deployment.

Traditional deployments often take minutes at a time and are difficult to control, let alone maintain and scale. The new platform keeps deployments on the order of seconds, with some projects being able to go from trigger to complete deployment in less than 10 seconds.
The project has automatic scaling and high availability. Extremely low resource consumption.

This is Serverless’s own advantage.
High controllable.

In the design below we keep multiple function endpoints, rollback and backup simple and reliable.
Light weight.

It is also because deployment is light, fast, and small that a large number of projects are broken down, and the modular iterations make all the work lighter.
High developer experience.

In addition to the advantages of the project, the deployment platform ensures the possibility of backtracking, interruption at any stage, is detachable and easy to maintain, and has the advantage of low perception and high experience for application developers.

Process overview

We can divide all deployment phases into four steps, from the start of the deployment request to the completion of the deployment:

Local synchronization on the user machine (or CI/CD server machine)
Universal server authentication, processing, and synchronization of build information
Build service packaging and compilation
Deploy the service to Serverless and collect the necessary information

[Serverless Bridge] --> [Builder] --> [Init shell] [auth] ↓ [CI /cdHook] | left | - > [build service] [user cli] | < - > [general service] - | | [website] left | - > [deploy service] - > [server less] [build database] |--> [router] |--> [log][health][...]  ----------------------------------------- [code storage]Copy the code

On the question of trade-off:

Why don’t we compile and deploy in one environment?

When Serverless is deployed, where it is likely to extend to multiple languages in the future, beyond the front-end or even beyond the business needs of the current platform’s build capabilities, it is possible to allow users to customize scripts for the build process, where the compile part is a clean container environment that provides only context. How a user operates within it only affects the business code relative to that user, not the deployed environment variables, scripts, or private information. On the other hand, deploying the service actually consumes very little CPU and is more robust than building the service. Running the deployed service separately not only makes it easier to meet the requirements of the Redeploy class, but also ensures more reliability during rollback.
The communication cost of differentiating multiple services?

There are communications costs, but they are low. In fact, for more complex businesses we may break down the above steps more carefully, but there is still only a triggering/triggered event relationship between services, and communication is not suspended for long periods of time. Even if we designed all the services in one process, it would only be shared memory, eliminating the need to read and write source code for user deployment projects, which in fact is nothing compared to the build time. In addition, because we have a good design in code Diff, the cost of reading and writing the user’s source code is lower each time. (In fact, even if you put all the functionality in one instalment, you still need to save the user code in case you need to roll back, see the build results, etc.)
Is it difficult to Access third-party authentication?

Many companies already have their own authentication systems, and having more services means that each service has to be individually connected to the authentication system and has to design different control ranges for different permissions (and you have to do a lot of junk work in the face of upstream changes), which is a lot of headaches. In this case, we only need to access the universal server for authentication. The other services are neither authenticated nor authenticated, which will be described below.

Let’s comb through the entire deployment process as shown above:

Client passcli hookOr some other way to upload code to a generic service (we can also call itweb serviceIn general service, we link at least one authentication service and deployment database, which can be identifiedUser-project-DeploymentSuch relationship.
You also need to treat the deployed source code diff in a common service, storing the changed code in a code base (data-storage) and update a version later tobuild-serviceInitiate a build notification.
build-servicePull specified code according to specified identity, preprocess code and in numerousBuilding the service heapTo find a suitable build script to handle, and finally start building.
build-serviceAfter a build is complete, only notification is given and cleanup of the environment (upload completed code, clean up build logs, and so on) begins. Ready to start the next build.
Notification in universal servicedeploy serviceThe deployment starts.
deploy serviceAlso by starting to pull the completed build code through the flag, checking it and deploying it according to the notificationserverlessIn addition, the specified routes, gateways, and logs are configured.
Finally bygeneral serviceNotifies the user that the deployment is complete and returns each address and detailed information.

Collecting deployment code

Collecting source code is the first step of a deployment, whether through Git Hook or user’s local command. Different from the common methods of package upload, upload after build, and push image, we need to pay more attention to the weight and collection speed of user code blocks. The common solution is to compare each user’s base file or folder with the records and upload only the modified files, which is similar to Git.

If our service includes user all of the file description and the hash, you can easily compare a file is altered, on the implementation of logic can be considered on the basis of the directory hierarchy, breadth-first, step by step until you collect all the documents and their description information, transmitted to server them one by one, finally we command to the server to request a deployment. There are also several common questions:

User authentication
Project attribution
Deployment permissions
Information about a single deployment, such as the deployment version

Very simple user-project-deployment logical relationships can be easily implemented on the server side without going into detail, except when to establish them.

We can think of project maintenance, deployment maintenance, as different resources, but they have nothing to do with the project’s source code. In record, assuming the deployment without any source, we only for user record set up projects – binding with the user – build deployment version – binding deployment version with the project of simple relationship, finally began to accept the source upload again, put the source code as a stock in the OSS or other low frequency in the database, You can write the deployed version as a prefix or description (not SQL) when writing the source code. This allows you to find the deployment version from any source file, and all source code from any version. For example, if there are multiple files in different folders, start the source collection:

Synchronize user and project information, create a deployed version, and prepare to use the deployed version to upload the source code.
Collect source code:

2. A Collects data one by one from the shallow directory level to the deep directory level and compares it with the server immediately. If no changes are made, all files in the folder are discarded

2. B Creates a new description object for each modified file based on the existing file object (for example, Stats), including the relative location of the file

2. C collects information such as the size of changes and uploads all changes in sequence

2. D requests the server to lock all files and stop accepting changes after all changes are uploaded
After verifying the validity of all files, the server writes the file description together with the file into a version set of the OSS (for example, OSS).
After receiving the request to deploy the V-D.D. version and related information, the server sends a notification to the Build Service and updates the record information of the version.

Let’s start with the purpose of this design. In the code collection part, we focus on two points: one is to split all the source code into files and upload them, only recording their location information, rather than ordinary packaging; The second is to separate source code from deployment, project and other business logic. Keeping these two points in mind is key to the deployment service design in this article.

Many deployment platforms and services cannot deploy on demand because they rudely package and upload the single source code, especially for the source code triggered by Git Hook. Even if there is diff, it is after network transmission, which has wasted a lot of network transmission time. Imagine when you change a project of tens or even hundreds of megabytes, you have to wait for tens of seconds or even minutes to complete the trigger of the build, especially when the external network, springboard connection, efficiency can be imagined. Even if they are just a crude recycling of the build container after the build, never mind the source code comparison with the last build. Here we can figure out how much code to upload with just a few dozen simple, quick, concurrent HEAD requests, and the server can concurrently query the hash against the set of items in OSS to see if there are files associated with the hash. When actually landing, The Query & Diff time for large projects can also be kept to a few seconds, which can greatly reduce the time spent preparing for a build.

As mentioned above, storing numbered versions of the source code in OSS or other databases has several implications: Source in most cases is more larger files, even sometimes file contains media types, characteristics and a single write permanent can’t rewrite, the future in other services at most will only read and store such a large amount of data will not affect the original database, also is very good for our business in the future environmental change, change, performance optimization. For example, when you need to move to a cloud service provider, you can consider some of their large file storage services for internal communication.

Design ready to build

in`general service`To prepare

In this design, we did not start building until the file synchronization was complete. This is because we kept the permission to build on the General Service. The advantage of this design is that users can build all files of the specified version at any time with their own authentication information. This kind of requirements can be implemented smoothly in the design of current deployment system. This can be useful if you’re using Git hooks to repeatedly trigger builds without changing the code.

In the build request, don’t forget to gather some information needed for the build, such as the user’s preferred script, the specified before-build, the specified Builder (more on that below), and so on. You can now send a notification to the Build Service via the General Service saying “Build started”.

In preparation for build Service, there is almost no requirement for authentication, as long as the build is from the service push, it is done. We accept almost one simple piece of information at the code level, the so-called version number, because all the source code has been stored in extra space, and we can easily find all the files related to the build by using this unique version number and synchronize all the files. (This synchronization code is read-only, but it is also the Intranet synchronization, the loss of time is very short, do not worry about)

[general service] <---> [build service] <--- [code storage]
Copy the code

On the General service side, you can’t just throw away the user or Git, you need to hold the link and send them build logs. You can use a third-party logging service or pass through the build server’s log database. However, log viewing permissions are still here in the central service, making it easy for you to design or access the permissions system. In the future, you’ll be able to add other commands to the central service, such as stop builds, restart immediately, etc., and the central service will simply update its record database once and notify the Build Service.

in`build service`To prepare

Well, we can simply start downloading the source on Code Storage until we receive the build notification, but that’s not enough. We have to restore the source file to the hierarchy before the user uploaded it. As you probably already know, it’s easy. Because we record the relative location of each file on its description, build Service only needs one walk to recover the entire code base.

Of course, this user code will need to be built in isolation in the future, so we need to have at least some containers ready. Here are two options:

Use a separate script to control the container.

The advantage of separate scripts is that they are easier to separate and maintain, and you can even write administrative scripts in a language you are familiar with. For example, there are many mature Python shell administrative scripts that can be used with simple modifications. The difficulty is that the only way to inject logic or parameters from your code is through environment variables. If you need to interact and control your classes, you need to keep information about the child process running the shell in your code and manage it.
Use programming through interface control.

Instead, you can fine-tune the container in your code according to the business logic, such as pre-setting up a pool to allocate resources. Of course, you need to do all the control details yourself, such as creating, managing, mounting, synchronizing logs, etc.

Don’t forget to set the environment variables before actually starting the container. Suppose we have a basic build script called build-init.sh. Build-init. sh should not involve any specific language, framework, compilation mode, etc. We use this initial script to plan how to build in different languages and frameworks (more details in the following article). The meta information that comes with a build notification is best set up in the container’s environment variables. This might be the build preference of the current project, the specified initialization location, and so on, but it should never be any key. All environment variables set during the build container initialization should be related to this build only, not deployment, and should not have any business information that contains the project. This is for build security.

build

I’ve seen a lot of “platform” services on Jenkins that claim to be automated, packaged and released with a few words of script (and you’re probably suffering from that in your own company), and most of these “platform” services have some common problems:

Slow releases (because builds are hard)
Few supported languages make extension extremely difficult
The build script debugging is difficult and the build script debugging permissions are confused
Build time is opaque, unisolated, and uncontrollable

When you decide to optimize your build script, you find the command written in the middle of a bunch of commands, only to discover that it’s a service you’ve installed, and optimizing it is almost impossible when you rely on a bunch of old, cumbersome scripts. There is no exaggeration. The confusion is caused by the fact that the build part was not decoupled at the beginning of the design of the build platform, not to mention the design of extensible build slots according to different languages. Faced with the requirements of new languages and frameworks, it gradually became unmaintainable and had low experience. Of course, there is no lack of outstanding among these people, they found a good way to be lazy, is to let the user write a dockerfile in each build, neither care about the process nor care about the content, which is often said “no operation” way.

Principle of building

Since we need to run our build results on Serverless, building alone is not enough. We also need some functionality to smooth the gap between Serverless and SCM. Our goal is to build the process as simple as on a single chip microcomputer, but also to be able to migrate seamlessly to Serverless. In addition, the construction of services may face the problem of multiple languages and frameworks. On the basis of being able to complete tasks, reasonable abstraction and disassembly should be carried out to make the construction process simple enough that anyone can write and scripts of each framework can be switched freely. Taking NodeJS as an example, this paper introduces the design scheme:

[builder A] [builder B] [builder C] ... / / repos -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- [NPM/CDN/oss] -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- left -- -- -- -- -- -- -- -- -- < the container > ---------- init.sh ↓ starter.js (nodejs shim) ↓ require([Builder])(configs) ↓ Output files ↓ [end.sh]Copy the code

In a build process, there are the following steps:

Start init.sh (or whatever script you first start when the container specifies the command) at container startup. This script is used to maintain the base environment.
Starter. Js for building. Of course, you can skip step 1 and just use NodeJS to solve all the problems.

2. A starter. Js collects user-specified build objectives, and can also do pre-builds by itself, such as analyzing target frameworks.

2. B According to the specified framework in NPM/CDN/OSS or wherever you want to download prepared scripts, we can call builder.

2.c references the Builder and injects the current build information into it. Such as entry, output target, available working directory, etc.

2.d Because init.sh is initialized, the builder only builds in the available working directory and outputs the result to the exit.
NodeJS exit, end.sh collects exit code and other information and uploads or saves the input file to Code storage 2, which is dedicated for storing output code.

To ensure that the build is complete, we also need to prepare the appropriate Builder, the granularity of which depends on the scenario. The advantage of this is that the Builder we maintain is no longer a script that is executed in a child process, but rather a contextual, pre-processed plug-in, such as the Python plug-in here, where the writer doesn’t have to worry about process and preprocessing at all. The Builder writer only needs to write a function that takes a specific argument (of course you can check with a generic type to constrain the plug-in function), and the function does only one thing, which is to do the specified build based on the argument passed in. Internally, the Builder project can be easily maintained for any new framework or new language (just add a plug-in), or if it is an enterprise or open source service, the experts of each language and framework can be responsible for their own Builder (now does this). They have their own way of optimizing the framework they’ve been immersed in for so long. Of course, you can always write shells in plug-ins if you like.

Since there is a context and a framework-specific plug-in, testing and maintenance are easier, and you can use test cases to ensure build stability when changing or adding a plug-in.

Link Serverless

As mentioned in the previous section, inserting programming language-based plugins into the build process may cause you to worry about adding complexity to your system, but in the face of Serverless’s different programming language interfaces, you still need to consider manually plugging in each programming language and framework. Because a common project cannot be deployed directly on the Serverless platform. Different platforms have different constraints on context, handler, etc. in different languages, and you need to consider how to link based on your platform of choice (e.g. AWS/Azure/Google/Aliyun) and framework.

Using NodeJS as an example, on most Serverless platforms they encapsulate objects such as req/res/context. We can design a bridge.js to bridge the differences:

// bridge.js
module.exports.handler = (req, res, context) = > {
  // req.xx = ...
  // res.json = function() {}...

  try {
    // PLACEHOLDER
  } catch (e) {
    console.error(e)
    process.exit(1)}return code(req, res, context)
}
Copy the code

In each Builder, look at the differences between the platform and the language framework to design the Bridge, and eventually replace the PLACEHOLDER with the built entry. For example in NodeJS we need to extend REq/RES to standard HTTP objects. When actually deployed on different Serverless platforms, you can specify that execution starts from Bridge and starts referencing business code after the middleware smoothen the differences.

We try not to hack user code in our deployment, and we can integrate services for monitoring, health checking, and logging.

The recycling code

Building the service at the end of the build requires recycling the outputs into the designated storage space rather than passing the releases directly. On the one hand, it is convenient to overwrite the releases several times, and on the other hand, we need to show the concrete build results to the users. If necessary, we can also let the users download these files and deploy them manually. When the build is complete and the correct exit code is obtained, copy the code from the container and upload it. Similar to collecting the code, you can record the location information for each file once, break it up and save it, or pack it and store it in OSS, because most Serverless supports deploying a compressed code package directly from OSS. This saves us a lot of follow-up.

Note that so far we have not dealt with business related environment variables, keys, and other business information, so the code packages taken out of the container are already built but not working, and need to be combined and distributed to various environments in the deployment service. Yes, in the design of this article, the build process does not distinguish the environment. The build code itself does not contain the environment information. If your business has a strong build and environment correlation, it is recommended that you deploy the build service on multiple nodes. (This is not recommended, of course)

The deployment of

Deployment target

Serverless is characterized by fast startup, no complex business in a single run, and the ability to decouple business into fine-grained logic allows developers to focus on the logic itself, so we typically do not deploy business routing on each Serverless node’s business. Not impossible, you can resolve different transactions by identifying different path parameters through traditional routing, but in many people’s designs this can make a single Serverless service a huge, comprehensive application that keeps running for a long time rather than splitting components, which loses its original advantages: Scale-out capability, low cost, high speed deployment and startup, automatic shrinkage, and more.

Imagine if the core business is called in your business with high frequency, and the surrounding services is less, and even can rely on cache, that they can be completely decoupled into multiple services, through your just a key to build a platform for the deployment, automatic core business are Serverless shrinkage, high load, the characteristics of fast startup, The peripheral services are in the “sleep” state for most of the time to save resources. If necessary, they can continue to expand the service horizontally and access painlessly. This is a very rare and efficient solution in some scenarios.

Does all business routing and forwarding have to be delivered to gateways or load balancers provided by service providers? Not exactly. When you consider that all the business can be deployed on a single Serverless node as a single module, the connection between the modules can be outsourced, not a single computing function dealing with a single problem. In short, you can do anything, depending on the business scenario.

It is also worth considering before deploying. Since Serverless is stateless, we do not store any data in the container (this does not make sense). A common solution is to connect to third-party databases and services, such as databases and other services that can be operated on the Intranet in different cloud platforms. All third-party services need multiple environments at deployment time. Stateless is a good programming style but not for everyone. This means you can no longer rely on memory to hold all states, nor can you share memory information across all services (they are already in different containers), and you need to think about communication whenever you want to hold some state.

Deployment way

Since we have kept the code package intact in OSS at build time, the deployment of the code block is as simple as finding the appropriate storage location each time we receive a General Service notification (no download). As mentioned above, the General Service also collects user-configured Serverless external routing information, keys, environment variables, etc., which we need to sort out and create one by one:

For the user inServerlessPlatform creation projects, services, and so on, depending on the platform interface.
To create aLambda(depending on the platform), the entrance is designated as our fixedbridge 。
Fill in the information specified by the user, such as memory size, operation upper limit, key, etc.
Create third-party services such as gateways (if any).
Create a subdomain based on the user and project information and point to thisLmabda.
Creates a CDN based on the specified cache information (if any).

After completing these steps, a Serverless service that can be directly used by services is basically deployed. We can directly deploy the Serverless service at Deploy Service and notify the General Service that the deployment is complete, delivering the aggregation information to users.

In each Serverless service platform, there is always no billing (or very low billing) as long as we don’t start calling the service, so the strategy at deployment is different from traditional replacement, scrolling, etc. We always add rather than modify, and every time we receive a new notification, Whether it’s a new project, a new release, or a redeployment, we create a new functional service and configure a new subdomain (or other markup). When a node needs to be transferred to production, we can direct the production traffic to the function by gateway, load balancing, custom routing, domain name resolution, etc. The benefits of doing this are obvious:

Always traceable to any version with simple design and fast rollback speed
Large businesses can also be seen at a glance by counting resources by endpoint
You can dopre-productionPreheating, seamless switching
Multi-environment simply specify the multi-service and multi-domain connection to the specified endpoint

For deploying front-end resources, static resources can be run in a function and CDN can be done in the outer layer. For the popular SSR in the front-end, the container of function calculation is the perfect server.

Optimization and Practice

About Deployment Speed

Compared to traditional packaged deployment, the FAAS-based deployment architecture described in this article can greatly improve deployment speed, and due to the deconstruction and disassembly of the project, a function module can be collected and deployed in seconds, even if your project is large enough. This is partly due to the optimization of code pack updates and partly due to the almost negligible startup time of the Serverless endpoint. This is an almost optimal solution for teams that value rapid deployment and rapid iteration.

For a few projects, there are still too many external dependencies after business separation, resulting in a long build time. We can create an additional cache storage for the external dependencies of the cache, which can be differentiated by project during construction. This part of the dependency can be provided in the context of Builder, by each Builder to determine the use of external cache scenarios, the design of different language frameworks are different, can also be optimized in various fields.

About Building containers

Although the paper manually mount vessel recommended building code, but you can still choose to use Serverless building code, namely every build individual endpoints, create a function advantage is to reduce the management and development costs, but the current domestic cloud platform is generally not high function of memory, code space also is lower, poor practicability, If you’re plugging into a platform like AWS/Azure (or building your own, like Fission based on Kubernetes), try this simpler solution.

security

The security and availability of the platform can be viewed on the relevant Serverless platform. The deployment platform does the deployment but not the running container. For deployment platforms, security concerns focus on inserted deployment scripts, but always build only in containers without affecting other projects, and you can whitelist and filter the Builder if necessary.

management

We collect information about projects, deployments, logs, etc., and if you need a visual console like CMDB, you can easily aggregate the data to WEB services. We have integrated business-specific monitoring into the Bridge, with run-time performance monitoring linked to the query interface of the cloud platform.