Kiev ++ Coroutine Framework technology Insider, Libgo

Article source: Click the open link

www.cnblogs.com/yyzybb/ — Yu Yangzi \

Github.com/yyzybb537/l… \

Kiev Framework Overview

Kiev is the Linux-C++ backend development framework currently used on the Meizu Technology Push platform. Since the project was established in 2012, it has been built by a number of meizu senior architects and senior C++ engineers successively. Up to the time of writing of this paper, it has experienced the test of pushing platform, a large-scale distributed system with tens of millions of users, for nearly five years. Today Kiev makes tens of billions of RPC calls per day for hundreds of services on the Meizu push platform.

Kiev is a fully developed C++ development framework for large distributed backend systems. It consists of the following components:

RPC framework (TCP/UDP)
FastCGI framework
Redis client (based on Hiredis encapsulation)
Mysql client (packaged based on mysqlClient)
Mongo client
Configure the center client (Http protocol, based on curl)
Distributed components based on ZooKeeper (service discovery, load balancing)
The log module
Status monitoring module
The core module is an open source ‘CSP concurrency model’ coprogramming library (Libgo).

Concurrency model

Kiev uses a variant of the advanced CSP development model (golang is one of them), which is inherited from Libgo. The main reason for choosing this model is that the development efficiency of this model is much higher than that of the asynchronous callback model, and there is no need to make any compromise on performance. In this paper, we will make a detailed comparison of several common models.

CSP model

3. CSP(Communicating Sequential Process) model is a popular concurrent model. The concurrent model used in Golang language is CSP model. In the CSP model, coroutines do not communicate directly with each other, nor deliver information directly to the target coroutine like the Actor model, but exchange data through a Channel.

The benefit of this design is to reduce the coupling of coroutine interactions through the Channel middle layer, while at the same time ensuring flexibility, which is ideal for developing concurrent programs.

RPC framework

Remote Procedure Call (RPC) is a Remote Call protocol. Simply speaking, it enables applications to Call Remote processes or services as local methods. It can be applied in many scenarios such as distributed services, distributed computing, and Remote service invocation. Speaking of RPC, we are familiar with many excellent open source RPC frameworks, such as Dubbo, Thrift, gRPC, Hprose, etc. The RPC framework was developed to simplify network communication between services in the background, allowing developers to focus on business logic instead of dealing with complex network communication. In our opinion, RPC framework is not only to encapsulate network communication, but also to cope with the challenges of hundreds of different services, tens of millions of users and tens of billions of PV volumes. RPC framework must be perfect in high availability, load balancing, overload protection, backward compatibility of communication protocols, graceful degradation, timeout handling and disorderly start.

Service discovery

Kiev uses ZooKeeper for service discovery, each Kiev service that opens registers a node on ZooKeeper with address and protocol information. In horizontal scaling, homogeneous services are registered to the same path, resulting in multiple nodes. When a dependent service is invoked, zooKeeper queries available nodes, selects a connection based on the load balancing policy, and invokes the node.

Load balancing

There are two built-in load balancing policies, Robin and Conhash, which can be customized based on actual service scenarios.

Overload protection

Kiev has a built-in overload protection queue divided into 10 priorities. When each request arrives, it enters the overload protection queue first, and then the work-coroutine takes out the request for processing. If the working coroutine is slower than the request arrival rate, the overload protection queue can pile up or even fill up. When the overload protection queue is full, a lower priority request is removed from the queue when new requests arrive, leaving a space for new requests. At the same time, the request in the queue is time-limited. If the request is not processed for a long time, it will be discarded to avoid processing the request that has expired. This mechanism ensures that limited resources are available for critical business use when the system is overloaded.

Communication protocol backward compatibility

Because microservices architectures often require partial publishing, it is a necessary feature to choose a communication protocol that supports backward compatibility. Kiev uses Protobuf as its communications protocol.

Work with third-party libraries

The original Kiev was based on the asynchronous callback model, but many third-party libraries only provided versions of the synchronous model that were difficult to use with. Kiev the Kiev CSP concurrency model is automatically converted to THE CSP concurrency model with libgo’s Hook mechanism, which takes advantage of the blocking CPU time in a synchronized third-party library to execute other logic. Third-party libraries of the asynchronous callback model can also use channels in the CSP model to wait for the callback to trigger; This works perfectly with third-party libraries.

Kiev functional components structure diagram

Kiev history and technology selection

In 2012, meizu’s push business was just beginning to transition from a traditional architecture to a microservices architecture. In order to improve the efficiency of development while splitting the system, we decided to create a C++ development framework. This is where Kiev came from in the early days.

The first version of Kiev used a multi-threaded synchronous model, with business logic written sequentially, making it very simple. However, due to the OS’s limited support for the number of threads, the growth of scheduling consumption is non-linear as the number of threads increases, so excessive request concurrency cannot be supported.

As the number of users grew, we needed to support more concurrent requests, and since coroutines were not as popular as they are today, we decided to write Kiev using an asynchronous callback model. In its early days, the business was simple enough to handle development tasks with an asynchronous callback model.

Over the next few years, Kiev developed a number of services using the asynchronous callback model, and as we used it, we saw more and more problems with logic fragmentation. Worse, sometimes the long callback chains were entangled with finite state machines, making the code harder to maintain. Code snippets like the following are common:

To solve this problem, we introduce Tencent open source cooutine library libco to execute synchronous code logic in the coroutine. At the same time, Hook technology is used to take advantage of the waiting time slice in blocking IO request, switch CPU to execute other coroutines, and then switch back to continue executing logic after IO event is triggered. Fragmented code like the above becomes continuous business logic, and no longer requires manual maintenance of context data. Temporary data can be placed directly on the stack, and the code looks like this:

Libco, however, only provides coroutines and hooks. We need to switch coroutines ourselves. To keep things simple, RPC frameworks have evolved to take a connection from the pool, send a request each time an RPC call is made, wait for a reply, and then release it back to the pool. Each connection can only run one request at a time, and the RPC protocol degenerates into half-duplex mode. In order to ensure performance, hundreds of TCP connections must be established between each two dependent services. In this way, hundreds of connections will be established with these processes respectively on the services that depend on the horizontal expansion of many processes. TCP connections can reach thousands or even tens of thousands, causing great pressure on the server. Connection requests are shown below, where each connection line represents hundreds of TCP connections.

Accordingly, we have updated the Redis, mysql, and FastCGI modules in Kiev to use the coroutine model.

For the first few months, this worked well for us, with decent performance (around 20K QPS for Rpc requests). As time went by, we got more and more users and more and more requests. Finally, after a new product launch, one of our non-critical businesses failed.

The service that fails is a service that receives subscription requests from the mobile phone. After the subscription request times out (about 30 seconds), the mobile phone tries to initiate the request again. As the system was overloaded, the processing speed was slower than the request speed, and a large number of requests were accumulated in the queue. As time went on, the response speed of the service processing request became slower and slower. As a result, the mobile phone considered that many requests had timed out before they were processed, and initiated a second request, forming an avalanche effect. At that time, we urgently added some servers and recovered the failure. After summarizing, we found that the main cause of the incident was that we did not do a good job in overload protection mechanism. As a result, we have decided to add an overload protection queue into Kiev with 10 priority categories. When each request arrives, it enters the overload protection queue first, and then the work-coroutine takes out the request for processing. When the overload protection queue is full, a request of the lowest priority is removed from the queue to make room for a vacancy. At the same time, the request in the queue is time-limited. If the request is not processed for a long time, it will be discarded to avoid processing the request that has expired. As shown below:

With machine more and more, and subsequent appeared some business of super-long link request form (here explain the problem of long link requests, long link request is a request should be handled through the many services, in the process, must wait until the back of the service in front of all processing is completed or timeout, will release its take up the TCP connection, Such a mode can greatly affect the number of concurrent requests across the system), the pressure on the number of TCP connections increased, and finally it was necessary to consider switching to full-duplex mode on a single connection. However, the libco function used at that time was too simple, so it was difficult to develop RPC framework based on the full duplex mode. At that time, a colleague created an open source project called Libgo on Github, which is a coprogramming library of CSP concurrency model like Golang language. Therefore, we did some technical pre-research for a period of time. See if you can replace the existing libco. The table below is a comparison of the two projects in some of the dimensions we care about:

After some research, we finally decided to use libgo instead of libco.

It is very easy to realize full-duplex communication RPC based on THE CSP model. The client only needs to save the ID and channel after each request is sent and wait for the corresponding channel blocking. After receiving the response, it can find the corresponding channel according to the ID and write data. With a single TCP connection, numerous requests can be sent concurrently, and the stress of TCP connection management associated with distributed horizontal scaling is no longer an issue. At the same time, the QPS of RPC requests was easily increased to more than 100K because each RPC required fewer resources and the performance was greatly improved. This performance metric currently exceeds that of most open source RPC frameworks.

Contrast this with popular open source frameworks

Copyright notice: This article is the blogger’s original article, shall not be reproduced without the permission of the blogger. Blog.csdn.net/tech_meizu/…