Recently, cloud + community technology salon “efficient and intelligent operation and maintenance” came to a successful end. This salon launched a technical feast centering on operation and maintenance, sharing dry technical practices about business operation and maintenance from AIOps, Serverless DevOps, Blue Whale PaaS platform, K8S, etc. Meanwhile, Tencent brought massive business self-research cloud practices, and promoted the transformation from traditional operation and maintenance to cloud operation and maintenance. This article is Mr. Kong Lingfei’s sharing of Tencent cloud Serverless’s operation and maintenance capability, Serverless’s influence on operation and maintenance, and the operation and maintenance cases of wechat mini program album. This article was first published on the “Yunjia Community” public account.

I. Preface: What is our core appeal in the Internet era?

Before we start, what is our core appeal in the Internet age? — Our core appeal is applications, which can provide business capabilities.

As shown in the figure below, in order to enable the application to provide external services, we also need to be able to deploy the application somewhere, requiring a series of system resources, such as computing, network, storage, database, etc. After the application is deployed, we also need to update the application, monitor the running status of the application, etc., which basically covers all our requirements.

To meet these requirements, at the application level, we introduced software architectures, such as singleton architectures and microservices development frameworks. Use Doker, KVM, and so on to provide system resources. Implement application lifecycle management with EFK, Promethus and Coding. With the introduction of these components, we need to spend manpower to operate and maintain these resources. For the underlying system resources, we need system operation and maintenance. Although EFK and Promethus components can provide business operation and maintenance capabilities, we also need to operate and maintain these platforms. What we really need is business operations.

So is there a way to do as little or no system and platform operations as possible? The answer is yes, we can adopt Serverless’s technical solution. In this sharing, I will use Tencent cloud Serverless products to explain how Serverless technology desalinizes users’ platform operation and system operation.

This sharing will be explained from the following aspects:

What is a Serverless

Serverless service o&M capability

Comparison of o&M capabilities between Serverless and IaaS

How does wechat small program album operate and maintain under Serverless technology

Introduction to Serverless

Serverless: The new trend in cloud computing

Before I get into what Serverless is, I want to show you how popular Serverless is right now.

In recent years, microservice and K8S are very popular. This is a comparison chart of Serverless and their popularity. The blue curve is the popularity curve of Serverless. The figure on the right shows the Serverless Production implementation. Serverless was first proposed in 2010. In 2014, AWS launched lambda service to productize Serverless and received good results. After Microsoft, Google and IBM saw this, Also launched their own Serverless products in 2016. Ali Cloud and Tencent Cloud also launched their Serverless products respectively in 2017.

2. What is Serverless?

Here’s what Serverless is. This is a logical architecture diagram, with our application at the top and system resources at the bottom. We can provide system resources through virtual machines, containers, databases, storage, and so on. At the same time, you need to maintain these system resources, such as resource application, environment construction, DISASTER recovery, and capacity expansion. What is Serverless? Serverless is that the underlying resources and the operation and maintenance of these resources are handed over to cloud vendors for maintenance. These resources are black-box for businesses, and businesses only need to pay attention to the development of their own business logic. This architectural idea and approach is Serverless.

Serverless is a software system architecture idea and method, not a software framework, class library or tool. Its core idea is that there is no need to pay attention to the underlying resources, such as: CPU, memory, database, etc., just focus on business development.

CaaS: compute as a service is used to provide computing power. BaaS: CaaS: compute as a service is used to provide computing power. Backend as a service is used to Serverless third-party components. Users do not need to build, operate, and maintain third-party components. They only need to invoke apis to use backend. Serverless = CaaS + BaaS

3. Physical Machine vs. Virtaul Machine vs. Container vs. Serverless

Here’s a comparative look at the value that Serverless can provide. In the field of software development, we can not avoid the two links are software deployment and operation and maintenance. If we are going to launch a business, in a physical machine stage, we want to buy a physical server, and may also need to build their own room, installation of refrigeration equipment, recruitment operations staff, and then build on a series of infrastructure, such as: virtualization, operating system, container, etc., have a lot of work to do. In the virtual machine stage, cloud vendors maintain hardware and virtualization infrastructure, and in the container stage, cloud vendors maintain OS, containers, and Runtime. Users need to do less O&M work. Then, in the Serverless phase, the user only needs to focus on Function, that is, their business logic. You can see that as the phase progresses, the user has fewer points to focus on and more on his business logic. Therefore, in the physical machine stage, we may need 8 people to develop a business, while in the Serverless stage, we only need 2 people, saving a lot of manpower. We can put the saved manpower into the business research and development, improve the product iteration speed, and thus improve the competitiveness of the product.

This chart also shows that cloud computing has been a process of “de-infrastructure” over the past decade or so. This process allows users to focus on the business development they really need, rather than the underlying computing resources. Serverless is in line with the development direction of cloud computing. This unique mode makes Serverless potentially of great value. To put it in perspective, Serverless can be considered the ultimate form of cloud computing.

4. Serverless Run the example

Here, take an example of a function running on Tencent Cloud Serverless platform.

If a user wants to use cloud functions, he should first do business development locally. After the function coding is completed, the code can be conveniently deployed to our platform through the VSCode plug-in provided by us. The BaaS service of the third party may be called in the function. Next, this function is bound to various triggers, such as API gateway, Ckafka, COS, and so on. Then we call the API gateway, upload files to COS, etc., to generate triggering events, and then trigger the bound functions to execute the business logic. When requests come in, the platform automatically or downsizes the backend Function instances, depending on the volume of requests. You can see that users do not need to do any system level o&M work.

5. Serverless of actual combat

Here is a video on how to create and execute cloud functions.

3. Serverless business operation and maintenance capability

Next, I will introduce how Tencent Cloud Serverless platform provides out-of-the-box business operation and maintenance capabilities. This chapter mainly introduces the following aspects: tool construction, DevOps, logs, and alarm monitoring.

1. Tool construction

Tencent Cloud Serverless provides a number of tools to assist r&d in development and debugging, help operation and maintenance to deploy functions online more easily. VS CODE is the IDE most used by domestic users. For this purpose, we developed VS CODE plug-in, which can facilitate the development and deployment of functions. We also provide a Web version of the IDE, which can be developed directly on the web. We also provide a command line tool, which can be developed and operated directly on the Linux terminal. At the same time, based on the command line tool, we can also connect to various DevOps platforms or do some automatic work.

2. DevOps solutions

In addition to developer tools, we also provide complete DevOps support, from best practices, to workflows, to toolchains, and to product development.

In workflow, for example, we support coding, building, packaging, deploying, testing, and publishing. In tools, we provide: CLI, application model, and so on. Here, we have opened many products for users to easily interact with these products, using the capabilities provided by these products, such as: Git repository, API gateway, etc. This is DevOps support.

3. Coding DevOps

Here are a few screenshots of managing the Serverless application CICD through Coding DevOps. Through continuous integration of Coding, build logs and test logs of each function application can be recorded. Coding artifacts can centrally store function images and do historical versioning. Finally, operations personnel can deploy functions to different environments through Coding deployment.

4. Log

Logs Two log query methods are supported. One is to query directly on our Serverless platform, which can check whether the function call is successful or not, the call time of each stage, and the logs printed by users in the log or the standard output. Users can search for logs by RequestId. In addition, we also support users to export logs to Tencent cloud log service system for persistent storage. In the log service system, users can search logs according to regular expressions, customize search rules for next retrieval, and generate alarms based on logs.

5. Monitor alarms

We provide three dimensions of monitoring. Monitors the number of calls, resource quantity, and outgoing traffic in a month. Provides monitoring of call times, runtime, error times, concurrent times, and restricted times by region, which are all indicators that users care about. We also provide function-level monitoring: the number of calls, the number of times resources exceed the limit, the number of times function execution timeouts, the number of times memory exceeds the limit, and so on. All these monitoring indicators can be configured on the Tencent cloud monitoring system to provide business-level monitoring capabilities.

Serverless System operation and maintenance capability

Here is the system o&M capability provided by Serverless. The Serverless infrastructure creates an MVM for each user. MVM is a lightweight VIRTUAL machine that provides the strongest security isolation. Lightweight virtual machines can be started in milliseconds with very low latency. Docker container is created in MVM, and then the user’s functions are scheduled to be executed in docker container. Docker is used for process-level isolation while container is used to allocate finer resources, so as to improve the utilization rate of system resources and reduce costs. At the same time, during the function execution, there is a scheduling algorithm, which can be real-time according to the CPU, memory, network IO, request volume indicators to expand. Meet user requests during peak hours. When the number of user requests is reduced, the system periodically reduces capacity to release resources and reduce costs. All of these capabilities are operated and maintained by cloud vendors, and users do not need to operate and maintain. Users only need to pay attention to their own business logic. In other words, users can ignore system O&M and only need to focus on service O&M.

5. Comparison between Serverless and IaaS o&M capabilities

This section compares the O&M capability provided by IaaS layer to experience the improvement of O&M efficiency brought by Serverless more directly. Operation and maintenance capabilities are compared in two dimensions: 1. Basic operation and maintenance capabilities 2. Core operation and maintenance capabilities

1. Create resources

(1) the IaaS

Let’s talk about traditional IaaS applications and how operations and maintenance work. The first is the resource creation phase, which generally starts when the development department launches online applications for new applications to the operation and maintenance department. After receiving the application, the O&M department creates a batch of VMS in each available zone based on the requirements and configures the network, firewall, and routing rules. Because of cross-department, regarding the scheduling problem, the speed will not be very fast.

After the cluster is created, install the software specified by the development department, such as runtime JDK and server Tomcat. Then the o&M department will install o&M software such as monitoring tool Prometheus, log tool Logstash, or other software such as security software, full link tracing software.

Once the machine is installed, you need to configure the DevOps flow line. Add the machine to various environments in the pipeline, such as development environment, pre-release environment, and maybe a second development environment and a second pre-release environment if there are more developers. Finally, the production environment. For high availability and fault tolerance, the operation and maintenance personnel will divide the production environment into several different small clusters and deploy them in different availability zones. At this point, with the efforts of operation and maintenance personnel, it may take a few days to quickly, for example, there is an automatic operation and maintenance platform inside the enterprise. It may take several weeks, all manually configured, to set up all the environments. In addition, the deployment and maintenance of CICD tools also require a lot of manpower from the operation and maintenance department, such as Jenkins cluster, GitLab server, Chef server, etc.

And the environment check back and forth could take a few more days. So the whole process could take weeks. Finally, the development process will be entered.

(2) the Serverless

Let’s take a look at how Serverless technology provides system resources. When introducing Serverless, we said that the underlying system resources of Serverless do not need users to apply for, operate and maintain, so there is no need to apply for resources and software according to this stage. The Serverless platform also provides DevOps functionality out of the box. Under Serverless, users only need to do the last step of development.

It can be seen that under Serverless, we do not need to create any resources. As there is no need for operation and maintenance, the r&d efficiency can be improved and the competitiveness of the product can be improved.

2. Service deployment

In the VIRTUAL machine era, service deployment requires O&M, operating system configuration, and service component configuration. At the same time, the business operation and maintenance needs to implement the logic of blue-green release and rollback, which is complicated.

In the container era, deploying a business also requires o&M intervention, and o&M requires writing complex YAML files, placing more skill demands on o&M personnel. Each time you deploy, you need to modify the YAML file and then execute a bunch of commands to deploy the business, which is much easier than in the virtual machine era, but still requires a certain amount of work.

On the Serverless platform, you only need to develop the upload code to complete the deployment. At the same time, the Serverless platform has implemented blue-green publishing and rollback functions, which do not require r&d to focus on this area, minimizing the complexity of business deployment.

3. Monitor alarms

In traditional applications, most monitoring functions are configured by O&M personnel, including network monitoring, system monitoring, application monitoring, and service monitoring. But first in Serverless application, application monitoring, system monitoring, network monitoring inside most of the monitor, are no longer need to focus on, and its platform will not exposed, exposed would be some higher level indicators, such as the call number, running time and memory, concurrent number of executions, limited number, etc. However, you need to collect logs for service monitoring.

4. Troubleshoot the fault

Traditional application troubleshooting methods are rich, such as full-link tracing and various monitoring methods. But this is limited to the technical prowess of large companies, because building and maintaining these troubleshooting toolsets is also a labor-intensive process. In small companies the only way to do this is through logging and code walking.

Tencent Cloud Serverless platform provides a lot of troubleshooting tools. For example, full-link tracking function, monitoring of various dimensions, and professional log query function. So Serverless era troubleshooting tools do not need operations personnel to build.

5. Elastic expansion and contraction

Elastic expansion is a core ability of operation and maintenance, and elastic expansion is generally concerned with the expansion time. In the VM phase, if resources are insufficient, you need to apply for a VM. After applying for a VM, you need to log in to the VM to perform system-level configuration and deployment, usually at the hourly level. In the container stage, the elastic expansion and contraction capacity of K8S is relied on, but the elastic expansion and contraction capacity strategy of K8S can only achieve minute level generally. However, in the Serverless stage, real-time microsecond scale expansion and contraction can be achieved.

6. Fault recovery

Fault recovery is another o&M concern. In the VM phase, the fault recovery logic needs to be implemented by O&M. If the O&M does not have some fault recovery logic, o&M intervention is required after a fault occurs. If the O&M intervention is manual, the response is slow. In the container phase, second-level fail-over is possible thanks to the self-healing capabilities of K8S, but in the Serverless phase, because each request is a new instance, there is no fail-over.

7. Performance tuning

Performance tuning is always an advanced issue that requires extensive experience. Tuning traditional applications typically involves virtual machine parameters, database parameters, network parameters, Linux parameters, runtime parameters, and server parameters. However, due to the particularity of Serverless application, the tuning of such underlying parameters is carried out by cloud vendors. The user only needs to tune performance at the code level.

8. Security

Traditional IaaS applications have many security requirements due to the complexity and flexibility of the technology stack. For example, host and network security, application security, access control management, terminal security, data security and so on all need to be concerned by operation and maintenance security personnel. Large companies typically maintain a large security department to keep the group safe. However, if a small company also has to maintain such a department, its business will be greatly dragged down, so that the security of small company business is very difficult to guarantee. For example, some databases do not change passwords, or expose public IP addresses, etc., small companies may not have the energy to conduct sufficient research on these products.

However, under Serverless, the underlying resources are maintained by cloud vendors, and the security is professionally guaranteed by cloud vendors. For example, Tencent Cloud Serverless platform provides network isolation, execution environment isolation and management environment isolation, function resource restriction, file system directory restriction, and system call restriction. Can improve the security at the same time, save this part of manpower. O&m security is the security maintenance of primary user code.

The development process

Vi. Case: How to operate and maintain wechat mini program album under Serverless

Here is an example of wechat mini program album to see how to operate and maintain under Serverless. If Serverless is not available, we want to develop an album mini program. It may take us N weeks to go to the component Team first, and then we need to register various accounts, which takes several days. Then we need to do some work related to operation and maintenance, such as: purchase domain name, purchase CVM, domain name record, etc. After purchasing resources, we also need to install Nginx, Mysql, monitoring and logging system, etc., which will take about 3 weeks or even more. Next comes the real business development phase. So for example, if you develop a small program, it usually takes a month or two.

When it finally goes live, I need to maintain…

When we complete the small program development, but also faced with a variety of complex operation and maintenance work. There are a lot of server-related components to operate and maintain, complex testing, and a lot of manpower to ensure security and stability.

Focus only on the core business

This figure shows the solution under Serverless. It can be seen that cloud vendors Serverless resource application and deployment of related components. This part of operation and maintenance work is maintained by cloud vendors, and users do not need to care about it, and users only need to pay attention to core business logic development and CRUD related to database. Greatly simplifies the development process and operation and maintenance workload.

Under the Serverless solution, a colleague only spent 2 weeks to complete the development of core business logic.

introduction

Kong Lingfei, senior architect of Tencent Cloud, is responsible for product development of Tencent Cloud function, assisting users to build Serverless system architecture together, and completing the planning and construction of cloud function platform together with product manager. Previous experience in virtualization testing and R&D at Red Hat and Lenovo, experience in large-scale container cluster development and architecture, and a deep understanding of cloud computing related technologies such as virtualization and containers.