On the afternoon of April 10, iQiyi technical product team held the offline technology salon of “I Technology Fair”, the theme of which was “Exploration and Practice of cloud native implementation”. Technical experts from Kuaishou, Baidu and Bytedance were invited to share and discuss practical experience of cloud native implementation with iQiyi technical product team.

Among them, Shang Liuyan, a technical expert from IQiyi, brought us the sharing of iQiyi private cloud Serverless practice. There are three key words in this sharing: Serverless, private cloud and landing practice.

** The following is “iQiyi Private cloud Serverless practice” dry goods share, ** according to the [I technology conference] site speech organized.

Iqiyi private cloud Serverless Practice /

Iqiyi Infrastructure Department Shang Liu Yan

The first part of this sharing will introduce you to the concept of Serverless. The second part describes the difference between Serverless services in public and private clouds. The third part is iQiyi’s specific landing strategies and plans, and its experience sharing.

01 What is Serverless

While introducing what Serverless is, I hope to answer some questions to help you understand what Serverless is. One of the best questions is – is Serverless FaaS?

Here’s what Wikipedia says about “Serverless” in Chinese and English:

Serverless is FaaS; Serverless is classified into Runtime and Databases. FaaS is classified into Runtime and Databases.

Some Serverless services on the market now, such as AWS and Aliyun:

AWS Serverless Services:

Aliyun Serverless Service:

At present, aliyun’s official website is not very comprehensive. In fact, its services are more similar to AWS, but AWS is reasonably divided into Serverless services: computing, application integration and data distribution. Aliyun also has a division, but it is not as detailed as AWS.

Here you can see some differences between FaaS and Serverless. As a whole, FaaS services are part of Serverless computing services. In addition, Amazon offers ECS, EKS, and Serverless computing services. In addition, the results of application integration in this aspect are more abundant, the most commonly used is SQS, a message-oriented middleware that does not need to pay attention to resources, and the data store also has Serverless DB.

Currently, providing services that do not care about the underlying infrastructure can be called Serverless.

One is that we don’t maintain the underlying infrastructure down here; Secondly, we do not care about the expansion of its resources, like DB, we know it is possible to run on the K8S cluster, also know that it has memory, CPU and disk, but we do not need to care about the situation of these resources.

So where should ** companies start when they want to implement Serverless construction in private cloud? ** This is also a problem.

02 Differences between private cloud and Public cloud Serverless Services

There are so many Serverless services now, but there are not so many choices in 2018, and it is no accident that we see Wikipedia (Chinese) not updated from 2019, because before this we thought Serverless was FaaS. So in 2018, IQiyi began to implement Serverless, and the first thing was to set up FaaS.

In 2018, the open source community had very little to refer to. Currently, there are two mature Serverless solutions, Knative and OpenFaaS. Knative was released as open source on January 31, 2018. OpenFaaS was originally a personal project and later founded as a company.

We also struggle with the selection of technology, because there is no good plan. As an internal innovation project, the company invested very little manpower. We especially hoped that the community could provide some support, but at that time, the whole community did not have a mature plan, and it was not sure which plan would become the mainstream in the future. Later, we chose the Fn project, which has not been updated for 16 months now, and the project has been confirmed to stop. At that time, I chose this project mainly for the following considerations:

First, Fn project is the Open source Serverless project of Oracle. We think Oracle is a leading company in ToB side and should be good at this kind of service, so the project will develop smoothly.

Second, the company used Mesos for container choreography at that time, and Fn had a lot of support on Mesos.

To sum up, Fn project was the most suitable choice for IQiyi at that time.

We have done some integration of the company’s internal services on Fn. After completing the MVP version, we want to find some businesses to serve as gripper drivers for subsequent development. At that time, reference was made to a classic case of AWS, elastic picture Resize service. This case is very appropriate for the FaaS application scenario: THE advantage of FaaS is that there is no need to manage the server, and picture resize is also a relatively simple function operation, which does not need a lot of code to complete; In addition, FaaS itself is continuously scalable and has the advantage of being billed per call, making it a great application scenario for many companies, especially startups.

Iqiyi also has picture services, and we also think this case is suitable for us to promote the scene as the first one that can be implemented. But there was a more dramatic moment when we were talking to the photo service team:

After this docking, we realized that the scheme promoted by public cloud does not work well in private cloud for two main reasons:

1) FaaS really can’t do complex functionality (2018), and container choreography services are already pretty good (for back-end engineers);

2) Pay-as-you-go is not even a consideration for private clouds.

So, do private clouds really need Serverless?

And if so, in what form?

Do private clouds really need Serverless?

First, the public cloud’s FaaS function is to connect other public cloud services. It is the product of cloud computing reaching a level of maturity (cloud native) that private clouds generally don’t reach. Also, it is unrealistic for private clouds to push the infrastructure to the extent that it can reach the public cloud, because the two goals are different.

If we improve the maturity of the infrastructure system of private cloud, can we make Serverless again?

That’s not realistic either, since public clouds have achieved economies of scale since 2015. The architecture of private cloud intentionally avoids this problem, and it is necessary to make some differences with public cloud to support our own business more. So, from this point of view, if we refer to the public cloud when we do the Serverless solution, it is actually impossible to go.

If a private cloud really needs a Serverless service, what should it look like?

If the public cloud uses Serverless to connect to the services of the public cloud, then the private cloud Serverless should be the services specific to the private cloud, which cannot be replaced by the public cloud, and more application-level services — the private cloud event-driven route;

In addition, FaaS in a private cloud may not be as convenient for the back-end engineer’s technology stack, but the same cannot be said for the front end, which is less knowledgeable about the server. In addition, we can see that there are a lot of front-end BaaS direction in the best practices and cases of Aliyun Serverless, and the big trend of front-end in these two years is also this direction.

Finally, the benefits of Serverless for private clouds are primarily productivity gains, not resource cost savings as advertised by public clouds.

03 How do I Implement the Serverless Solution?

First, adjust measures to local conditions.

Our own team is in the infrastructure department, which is far away from the front end, so the application event-driven direction is more suitable for our team to push forward.

The public cloud also has event-driven offerings, such as AWS EventBridge (below), an event bus service for Serverless.

This service can not only trigger services through events, but also do composition of events. As an example of iQiyi, a video production is completed, manual review is approved, and AI review is not approved. When all three events happen, manual review will be triggered. Without the event bus service, this process would be a hassle to cache these events first, and a large number of messages would be received in the middle, which would require a separate component to complete. Using Serverless’s event bus service saves us a lot of maintenance costs, improves development efficiency, and helps decouple the architecture.

In 2019, the company will push forward the localization, and the team hopes to implement this technical solution in the process of localization.

After analyzing the architecture of the central platform, we found that many central platforms do not generate new and more efficient services, but simply combine existing services. As centralization evolves, we can work with the team responsible for centralization to connect events to the event bus. We had the idea and found a team to communicate it, but new problems still emerged. The purpose of the Mid-Taiwan business team is to solve the problem of mid-Taiwan, there is no extra energy to help us open the event. So we further analyzed the main problems of the team in Taiwan. At first, we thought that doing the architecture of the middle platform was nothing more than connecting the original individual services, but in fact, these individual services themselves provide heterogeneous service interfaces. In addition, the combination of these services looks like a workflow, but issues of visibility, task retries, timeouts, fuses, and so on have to be addressed.

Therefore, we want to design a Serverless service, which can support the scheduling of different services without operation and maintenance, and its observability is automatically generated. Such services can help middle-office teams solve problems at work. In practical applications, both AWS and Aliyun also have such services, such as Step Function of AWS. The service we did to solve this problem was Airworkflow.

Image note: Serverless service for the mid-stage – Airworkflow

Workflow is defined by a state machine description language. Private clouds do not need to be the same as public clouds. They only need to solve their own problems as much as possible. This standard is implemented at an underlying level based on Knative Serving and Eventing.

The following is a modification of the observational problem.

In terms of observability, Airworkflow scheme has a particularly large convenience, users in the description of the state machine, the equivalent of business label in advance directly to play good, code that is associated with a state machine configuration, so the labels can be generated automatically, truly realized, after writing code, follow-up the observability of the content is in accordance with the business of an automated build. This is a particularly convenient function for the center.

Also, as mentioned earlier, our team wanted the mid-stage team to open up the middle events, so we actually took that into account when creating the Workflow standard. As shown in yellow, there is a keyword in the middle to open the results in a standard format.

So what are business events?

Business events are the biggest difference between private and public clouds. A public cloud event is an additional data in the message queue, while a private cloud event is a service-level event, which is a business event such as successful order creation and payment. Many of our applications would be easy to scale if they received these events at development time.

Figure note: Business events

As shown in the figure above, the purple block on the far right is log, which is labeled to clearly define which step it belongs to.

To facilitate user access, we also developed a Dashboard platform

Note: The Airworkflow Dashboard

When working with Airworkflow, the team found that there was still a lack of acceptance for Serverless. Although some business parties are willing to access, they all require us to make a demonstration, explain the cause and effect, and guide the operation step by step with technical documents. The whole docking cycle usually takes about a week, and another person has to start all over again.

Public clouds also encounter the same problem when promoting. Their solution is frequent preaching, which requires a lot of manpower and is not suitable for private clouds.

For example, in the early stage of business promotion, Didi faced the problem of no drivers, no users, and no drivers. To solve this problem, we need to establish a positive cycle. We also hope to complete our growth flywheel by building services.

Figure note: Build growth flywheel

Now that Airworkflow has services plugged in, the team is working on the event bus Eventgateway. With the event bus done, we can push the FaaS iteration.

User acceptance of Serverless will increase while using these services. Increased acceptance will prompt him to use another service, and our leftmost loop will be closed.

But what’s the cost of getting through? Is that we spare no effort to promote, even hand in hand to teach users transformation. Because of the manpower constraints, we want to build an ecosystem where users who are already connected can spread their experience.

Later, we will make a Dev App Store, which will deliver some events and functions as internal open source products to users in a packaged form. Through this way of low threshold, users can get to know Serverless, and then improve their acceptance of Serverless services. It will be easier to use other services in the future, so as to promote each other and complete the closed loop of our entire service.

Q&A

Question: Currently Serverless platform support stateless functions, will support stateful functions in the future?

Liu yan: Stateful services are much more complex than stateless services. If your service is stateful, you should first build it as a single microservice. Business scenarios often need to connect these microservices together.