| AI base of science and technology (rgznai100)

Participate in | Shawn, Zhou Xiang



As the premier conference in the database field, VLDB (Very Large Data Base) is the premier annual international forum for database researchers, vendors, participants, application developers, and users. This year’s VLDB was held from August 28 to September 1 in Munich, Germany, and covered issues in data management, databases, and information systems research.



AI Science and Technology Base Camp found that some papers of wechat team were also included by VLDB this time, as shown in the figure below:




In the paper titled “PaxosStore: High-availability Storage Made Practical in WeChat”, the WeChat team introduced the High availability Storage system in WeChat — PaxosStore. The system adopts combinatorial design in the storage layer and develops different storage engines for different storage models.

The PaxosStore feature is that it extracts the Paxos-based Distributed Consensus Protocol as a middleware that is accessible to the underlying multi-model storage engine in a variety of situations. This makes it easier to tune, maintain, scale, and expand the storage engine.


The wechat team said,
In engineering practice, implementing a practical consistent read/write protocol is more complex than in theory. To solve this complex engineering problem, they proposed a hierarchical storage protocol stack based on Paxos.
The basic data structure used in the protocol, PaxosLog, serves to bridge programmatically oriented consistent reads and writes to storage-oriented Paxos programs. In addition,
The paper also introduces an optimization method based on Paxos, which can make fault-tolerance more effective
. Around how to build a practical distributed storage system, this paper also discusses several practical solutions.


The following is the introduction of the paper:

The introduction

Wechat, one of the most popular mobile apps, has nearly 1 billion monthly active users (963 million as of June 30, according to Tencent’s second-quarter earnings report). Wechat provides users with instant messaging, online social networking, mobile payment and third-party authorization services. The back-end that supports these integrated businesses consists of a number of functional components developed by different teams. Although the business logic of these businesses is different, most back-end component applications require reliable storage support.


Initially, the development teams randomly selected existing storage systems as prototypes for individual components. However, various fragmented storage systems not only take a lot of effort to maintain, but also are difficult to scale. Therefore, it is necessary to develop a universal storage system to support various services of wechat, and PaxosStore is the second-generation storage system based on this development (the first generation oF wechat storage system was developed based on quorum Protocol).


In the business unit of wechat, the following storage service requirements are routine requirements.

First, big data has the famous three V characteristics: volume, Velocity and Variety. Wechat generates an average of about 1.5 terabytes of data every day, which contains a variety of content, such as text messages, images, audio, video, financial transactions and moments articles. During the day, tens of thousands of requests are made every second, and single-record accesses dominate the system.

Second, high availability is the primary feature of storage services. Most applications rely on PaxosStore for implementation, such as peer-to-peer messaging, group chatting, and browsing moments articles. High availability is critical to the user experience. Most wechat applications require latency overhead within 20 ms from PaxosStore. Moreover, such delay requirements must be met according to urban Scale’s requirements.

In addition to providing users with high availability storage services through PaxosStore, we also faced the following challenges:


  • Effective and efficient consistency assurance. Fundamentally, PaxosStore uses the Paxo algorithm to deal with consistency issues. Although the original Paxos protocol provided a consistency guarantee, the complexity of the application (for example, complex state machines need to be properly maintained and tracked) and the high operating cost (for example, the bandwidth required for synchronization) made it impossible for the protocol to support wechat’s comprehensive business.

  • Flexibility and low latency. PaxosStore needs to support low latency read and write at the city level. At run time, Load surge needs to be handled properly.

  • Automatic fault tolerance across data centers. In its operation, wechat’s PaxosStore app spans thousands of commercial servers in multiple data centers around the world. Hardware failures and network outages are common in systems of this size. The fault-tolerant system should be able to detect and recover errors without affecting the overall efficiency of the system.

PaxosStore is a practical high availability storage solution, and its application provides powerful storage support for wechat back end. The system adopts combinatorial design in the storage layer and develops different storage engines for different storage models. The PaxosStore feature is that it extracts the Paxos-based Distributed Consensus Protocol as a middleware that is accessible to the underlying multi-model storage engine in a variety of situations. This PaxOS-based storage protocol can be extended to support a wide variety of data structures in programming models built for a wide variety of applications. Moreover, PaxosStore adopts leaseless design, which can achieve good system availability and fault tolerance. The main contributions of this paper can be summarized as follows:

  • We introduced the design of PaxosStore, focusing on the construction and operation of the consistent read and write protocol. By decoupling the consistency protocol from the storage tier, PaxosStore is extensible to support multiple storage engines built for different storage models.

  • Based on the design of PaxosStore, we further discuss the fault-tolerant system and the details of data recovery. The techniques described in this paper have been fully implemented in large production environments, enabling PaxosStore to achieve 99.9999% availability across all wechat production applications (based on 6 months of operational data).

    The growing wechat business has been supported by the PaxosStore for more than two years. Based on these practical experiences, we discuss the trade-offs in the design of PaxosStore, and give the experimental evaluation results of our application practice.

design

The overall architecture




                       
Figure 1: The overall architecture of PaxosStore

Figure 1 shows the overall architecture of PaxosStore, with three layers. The programming model provides multiple data architectures for various external applications. The Consensus Layer implements the PaxOS-based storage protocol. The storage tier contains multiple storage engines built according to different storage models that can meet a variety of performance requirements. The architecture of PaxosStore differs from traditional storage designs mainly in that it can extract conformance protocol applications as a middleware to provide data consistency assurance for all potential storage engines.


Generally, traditional distributed storage systems are built based on a single storage model. The design and application of the model are slightly adjusted to meet different application requirements. However, this often requires complex coordination between different components. Although combining multiple off-the-shelf storage systems into one integrated system can meet different storage requirements, it often makes the entire system difficult to maintain and reduces scalability. Also, every storage system has data consistency protocols embedded in the storage model, and corresponding divide-and-conquer methods for consistency problems can be error-prone. This is because the consistency guarantee of the overall system is determined by the coupling of the consistency realization degree of individual subsystems. Furthermore, applications that need to access data across models (that is, across multiple storage subsystems) can hardly take advantage of the consistency support of any underlying subsystem and have to independently apply the consistency protocol for such cross-model data access.

PaxosStore uses a modular storage design. Multiple storage models are applied on the storage layer, which only deals with the implementation of high availability. Consistency is handled by the consistency layer. This allows each storage engine, as well as the entire storage tier, to scale as required. Because consistency protocol applications are decoupled from storage engines, all supported storage models can take advantage of data consistency assurance in various situations. This also makes cross-model data access easier to implement.

The design and application of this programming model layer is technically straightforward, and in this section we focus on the design details of the PaxosStore consistency layer and storage layer.

Consensus Layer

                    

Figure 2: Storage protocol stack



                                                        

Algorithm 1: Paxos implementation in PaxosStore





                                                             
Figure 3: PaxosLog structure

                           

                                                 
Figure 4: The PaxosLog + Data object

                                       

Figure 5: paxoslog-as-value for key-value storage



                                               
Figure 6: An example of the read process in PaxosStore

Storage Layer




                                   
Figure 7: Sample intra-area deployment of PaxosStore, C = 2

Fault tolerance and availability




                                       
Figure 8: Data recovery methods in PaxosStore

assessment


The experimental setup




                             
Figure 9: Overall read/write performance and effectiveness of preparation optimizations for PaxosStore

delay

          

Figure 10: Measurement of mini cluster size

Fault recovery




                                   
Figure 11: Monitoring the fault recovery of PaxosStore node in wechat

Paxoslog-entry Batch application validity


         

                                       
Figure 12: Paxoslog-Entry batch application effectiveness

conclusion

In this article, we took a closer look at the PaxosStore, a highly available storage system that can withstand tens of millions of consistent reads/writes per second. The storage protocol in PaxosStore is based on the Paxos algorithm for distributed sharing, and we further optimize it with practicability, including paxoslog-as-value and concise PaxosLog structure for key-value storage. Fault-tolerant schemes based on fine-grained data checkpoints enable PaxosStore to support rapid data recovery in the event of a failure without causing system outages. PaxosStore has been implemented and deployed in wechat to provide storage support for wechat integration services such as instant messaging and social networking.


During the development of PaxosStore, we have developed several design principles and lessons:

  • It is recommended not to support storage diversity through a compromised single storage engine, but rather to design a storage tier that can support multiple storage engines built for different storage models. This approach helps developers tailor performance to the dynamic aspects of operational maintenance.

  • In addition to errors and failures, system overload is a key factor affecting system availability. Especially when designing the system fault tolerance scheme, we must pay enough attention to the potential avalanche effect caused by overload. A concrete example is the use of mini-cluster groups in PaxosStore.

  • The design of PaxosStore borrows heavily from the event-driven mechanism based on messaging, which can involve a lot of asynchronous state machine transformations in terms of logical implementation. In our engineering practice for building PaxosStore, we developed a framework based on coroutines and socket hooks to program asynchronous processes in pseudo-synchronous mode. This helps eliminate error-prone function callbacks and simplifies the implementation of asynchronous logic.


Original address:

http://www.vldb.org/pvldb/vol10/p1730-lin.pdf