| AI base of science and technology (rgznai100)
Participate in | Shawn, Zhou Xiang
As the premier conference in the database field, VLDB (Very Large Data Base) is the premier annual international forum for database researchers, vendors, participants, application developers, and users. This year’s VLDB was held from August 28 to September 1 in Munich, Germany, and covered issues in data management, databases, and information systems research.
In the paper titled “PaxosStore: High-availability Storage Made Practical in WeChat”, the WeChat team introduced the High availability Storage system in WeChat — PaxosStore. The system adopts combinatorial design in the storage layer and develops different storage engines for different storage models.
In engineering practice, implementing a practical consistent read/write protocol is more complex than in theory. To solve this complex engineering problem, they proposed a hierarchical storage protocol stack based on Paxos.
The basic data structure used in the protocol, PaxosLog, serves to bridge programmatically oriented consistent reads and writes to storage-oriented Paxos programs. In addition,
The paper also introduces an optimization method based on Paxos, which can make fault-tolerance more effective
. Around how to build a practical distributed storage system, this paper also discusses several practical solutions.
The following is the introduction of the paper:
The introduction
In the business unit of wechat, the following storage service requirements are routine requirements.
First, big data has the famous three V characteristics: volume, Velocity and Variety. Wechat generates an average of about 1.5 terabytes of data every day, which contains a variety of content, such as text messages, images, audio, video, financial transactions and moments articles. During the day, tens of thousands of requests are made every second, and single-record accesses dominate the system.
Second, high availability is the primary feature of storage services. Most applications rely on PaxosStore for implementation, such as peer-to-peer messaging, group chatting, and browsing moments articles. High availability is critical to the user experience. Most wechat applications require latency overhead within 20 ms from PaxosStore. Moreover, such delay requirements must be met according to urban Scale’s requirements.
-
Effective and efficient consistency assurance. Fundamentally, PaxosStore uses the Paxo algorithm to deal with consistency issues. Although the original Paxos protocol provided a consistency guarantee, the complexity of the application (for example, complex state machines need to be properly maintained and tracked) and the high operating cost (for example, the bandwidth required for synchronization) made it impossible for the protocol to support wechat’s comprehensive business.
-
Flexibility and low latency. PaxosStore needs to support low latency read and write at the city level. At run time, Load surge needs to be handled properly.
-
Automatic fault tolerance across data centers. In its operation, wechat’s PaxosStore app spans thousands of commercial servers in multiple data centers around the world. Hardware failures and network outages are common in systems of this size. The fault-tolerant system should be able to detect and recover errors without affecting the overall efficiency of the system.
PaxosStore is a practical high availability storage solution, and its application provides powerful storage support for wechat back end. The system adopts combinatorial design in the storage layer and develops different storage engines for different storage models. The PaxosStore feature is that it extracts the Paxos-based Distributed Consensus Protocol as a middleware that is accessible to the underlying multi-model storage engine in a variety of situations. This PaxOS-based storage protocol can be extended to support a wide variety of data structures in programming models built for a wide variety of applications. Moreover, PaxosStore adopts leaseless design, which can achieve good system availability and fault tolerance. The main contributions of this paper can be summarized as follows:
-
We introduced the design of PaxosStore, focusing on the construction and operation of the consistent read and write protocol. By decoupling the consistency protocol from the storage tier, PaxosStore is extensible to support multiple storage engines built for different storage models.
-
Based on the design of PaxosStore, we further discuss the fault-tolerant system and the details of data recovery. The techniques described in this paper have been fully implemented in large production environments, enabling PaxosStore to achieve 99.9999% availability across all wechat production applications (based on 6 months of operational data).
The growing wechat business has been supported by the PaxosStore for more than two years. Based on these practical experiences, we discuss the trade-offs in the design of PaxosStore, and give the experimental evaluation results of our application practice.
design
Figure 1: The overall architecture of PaxosStore
Generally, traditional distributed storage systems are built based on a single storage model. The design and application of the model are slightly adjusted to meet different application requirements. However, this often requires complex coordination between different components. Although combining multiple off-the-shelf storage systems into one integrated system can meet different storage requirements, it often makes the entire system difficult to maintain and reduces scalability. Also, every storage system has data consistency protocols embedded in the storage model, and corresponding divide-and-conquer methods for consistency problems can be error-prone. This is because the consistency guarantee of the overall system is determined by the coupling of the consistency realization degree of individual subsystems. Furthermore, applications that need to access data across models (that is, across multiple storage subsystems) can hardly take advantage of the consistency support of any underlying subsystem and have to independently apply the consistency protocol for such cross-model data access.
PaxosStore uses a modular storage design. Multiple storage models are applied on the storage layer, which only deals with the implementation of high availability. Consistency is handled by the consistency layer. This allows each storage engine, as well as the entire storage tier, to scale as required. Because consistency protocol applications are decoupled from storage engines, all supported storage models can take advantage of data consistency assurance in various situations. This also makes cross-model data access easier to implement.
The design and application of this programming model layer is technically straightforward, and in this section we focus on the design details of the PaxosStore consistency layer and storage layer.
Consensus Layer
Figure 2: Storage protocol stack
Figure 3: PaxosLog structure
Figure 4: The PaxosLog + Data object
Figure 5: paxoslog-as-value for key-value storage
Figure 6: An example of the read process in PaxosStore
Figure 7: Sample intra-area deployment of PaxosStore, C = 2
Figure 8: Data recovery methods in PaxosStore
Figure 9: Overall read/write performance and effectiveness of preparation optimizations for PaxosStore
Figure 10: Measurement of mini cluster size
Figure 11: Monitoring the fault recovery of PaxosStore node in wechat
Figure 12: Paxoslog-Entry batch application effectiveness
conclusion
During the development of PaxosStore, we have developed several design principles and lessons:
-
It is recommended not to support storage diversity through a compromised single storage engine, but rather to design a storage tier that can support multiple storage engines built for different storage models. This approach helps developers tailor performance to the dynamic aspects of operational maintenance.
-
In addition to errors and failures, system overload is a key factor affecting system availability. Especially when designing the system fault tolerance scheme, we must pay enough attention to the potential avalanche effect caused by overload. A concrete example is the use of mini-cluster groups in PaxosStore.
-
The design of PaxosStore borrows heavily from the event-driven mechanism based on messaging, which can involve a lot of asynchronous state machine transformations in terms of logical implementation. In our engineering practice for building PaxosStore, we developed a framework based on coroutines and socket hooks to program asynchronous processes in pseudo-synchronous mode. This helps eliminate error-prone function callbacks and simplifies the implementation of asynchronous logic.
Original address:
http://www.vldb.org/pvldb/vol10/p1730-lin.pdf