The purpose of writing this article is to find that many colleagues who have been working on application development for a long time are confused about the whole system, especially the database and distributed application coordination. Therefore, this paper will stand in the perspective of application development engineers and architects to discuss CAP theorem for the deployment architecture of relational database system (such as MySQL), Raft algorithm of ZAB and ETCD in Zookeeper based on Paxos under trusted network, and some topics that can be put together and discussed in series in blockchain system. Get together and make a statement. Because this article is written within the limitations of an application development engineer and the author’s own experience (not as a DBA), some of the descriptions will be less “professional.” If there is a description of mission error in the article, please also put forward in the comments. Thank you.

What is the CAP theorem

Refer to the wiki explanation of CAP theorem

In theoretical computer science, CAP theorem, also known as Brewer’s Theorem, states that it is impossible for a distributed computing system to satisfy three things at the same time:

  • Consistency (equivalent to all nodes accessing the same latest data copy)
  • Availability (An error-free response for each request — but no guarantee that the data retrieved is up to date)
  • In practical terms, partitioning is equivalent to a time-limit requirement for communication. If the system cannot achieve data consistency within the time limit, it means that partitioning has occurred and it must choose between C and A for the current operation.)

Technical terms are usually very difficult to understand, so I’m trying to describe them in human language

  • C- Consistency: External access to any node, the data should be the same; (As an additional note, the C in CAP is different from the C in DATABASE transaction ACID. It refers to linear consistency. The C in database transaction is understood as logical consistency.)
  • A- Availability: if any node is dropped, the system can still respond to all external access normally without large area of failures and timeouts;
  • P-partition fault tolerance: When network communication between nodes is down, the whole system can continue to provide service, rather than directly crash.

So why can’t all three be done at once? Let’s make a simple proof in different scenarios:

The first is usability

Generally speaking, our top priority must be the overall availability of the system. In order to improve availability and prevent external services from being unavailable due to a single point of failure, we will try to add multiple nodes to the system to provide services together.

The second is consistency

However, after adding nodes, a new problem will be introduced. States and data in multiple nodes may not be consistent. We try to improve consistency across nodes of the system through various means. The ideal pattern is to synchronize all state and data changes to all nodes before each single node responds to the service and then return the response to the client.

Finally, network partitions were encountered

But if there’s a lot of latency on the network, even if the network directly creates partitions, it cuts off the whole system into two pieces. At this point, if each partition is likely to receive a request from the service network area. At this time, if the need to keep the systemconsistencyThen it should return failure directly, and lower fromavailability; If, on the other hand, toavailabilityIf the service continues to be provided, data between partitions will not be available for a certain amount of timeconsistencyThe client accessing the same system in different network areas may obtain different states and data.

conclusion

Then a distributed system we know cannot meet the three requirements of CAP at the same time. Perhaps the greatest significance of the CAP theorem is to inspire system architects and developers to properly balance CAP according to the characteristics of their own systems. Although CAP theorem provides a good methodology, I still recommend not to take a hammer and see everything is a nail. Simply use CAP to understand some mainstream software schemes. For example, MySQL is classified as CA system in most categories, but under the application of specific HA schemes, MySQL can also promote P at the expense of C. Next, we will first take the common deployment scheme of MySQL to analyze it in combination with the three characteristics of CAP.

MySQL consistency trade-off

First of all, the consistency we are talking about here is the consistency of distributed systems. According to the consistency of distributed systems, we can simply divide consistency into two types: strong consistency and weak consistency. In addition, there is a common word for final consistency, which we can first classify as weak consistency.

Weak consistency

The so-called weak consistency refers to the weak consistency if the data can be accessed only partially or completely after the update. The most typical example of weak consistency is final consistency. The most common scenario for ultimate consistency is our DNS service. We submitted a DNS record today. At present, the change may not be reflected in the whole network, but after 72 hours, all nodes in the whole network will reflect the change of the record.

Strong consistency

Strong Consistency in distributed systems is often referred to as Atomic Consistency and linear Consistency. Strong consistency Requires that each read can read the last written data of a certain data, and that all processes in the system see the same operation sequence as that in the global clock. Simply put, the data states of all nodes are exactly the same.

MySQL Replication

The main application of multi-node consistency in MySQL is to use Replication to synchronize and back up data across multiple nodes in a cluster to improve the availability of cluster data. Traditionally, the most common method is to synchronize logs. The default synchronization mode provided by MySQL is asynchronous synchronization, that is, after logging on a single machine, the transaction is committed first without confirming the response of other machines on the node. The other nodes then synchronize the operation mode based on the logs themselves. In this synchronization mode, we often use the partial scheme is master-slave synchronization and multi-master synchronization.

Master-slave synchronization

After MySQL Replication is started, the Slave node pays attention to the bin-log of the Master node. If any update is found, the Slave node pulls the log back to its own node for replay to achieve synchronization. In traditional MySQL Replication, the Master node is not aware of the existence of any Slave machine. You only need to configure the connection information and log path of the Master node on the Slave instance. In addition, the client uses read/write separation mode to access data models, which improves the overall availability (A) by sacrificing some consistency (C). At the same time, we can see how balancing CAP is not only a server-side issue, but also by adjusting the client application scenarios.

The synchronized

Multi-master synchronization is a deployment mode in which multiple sets of MySQL instances are master and slave. The idea is nice, but in practice, only those who have used it know how bad multi-master synchronization can be. The first is the primary key conflict, because the default means of synchronization is asynchronous synchronization (not strong consistency), so there must be a time difference between the data in the two nodes, causing the two masters to commit the same primary key data, causing the conflict. In MySQL 5.7 and later, however, this issue was partially resolved by adding synchronization mode under GTID (global transaction ID) (as opposed to bin-log only synchronization mode). However, in multi-master mode, because the communication mode of two nodes is asynchronous, it is still impossible to avoid data conflict between a single node and another node after transaction submission.

Advantages and disadvantages of the master-slave mode in asynchronous synchronization mode

First of all, the disadvantages are obvious, because asynchronous synchronization is used, there is no guarantee of strong consistency, if there are multiple nodes modified at the same time (multi-master mode) will bring a lot of data problems. Secondly, in the single master mode, if the single master fails in the master library, the system can be basically unusable by default. If the master/slave switchover is carried out, some data will be lost due to consistency problems. Finally, although the single point of failure problem was solved to some extent in multi-master mode (improved A), the problem caused by two Master nodes providing services independently in network partition mode was introduced (P kneeling). However, there are many advantages to this model:

  1. Easy to deploy, cost advantage;
  2. Only the Master node provides the write service, while the slave node provides the read service. If some core services have high requirements for consistency, read and write operations can be performed on the Master node.
  3. The Slave nodes are synchronized through logs. Only some of the table operations of interest can be synchronized. The data in the Slave nodes can be a subset of the Master. In addition, the data engine and index can be different from the Master node, which can be fully used to optimize the index and data engine of the Slave node in application scenarios.

Is there an alternative to asynchronous synchronization

Learning about asynchronous mode, we find that there are multiple nodes in the system, but only one Master node. As long as the Master node is unavailable, the whole system can be considered unavailable. To solve this problem, we have officially entered the world of distributed architecture systems. In the distributed architecture scheme of MySQL, the following are common:

  • MySQL Group Replication (MGR)
  • MySQL Fabirc
  • MySQL Cluster

One particular example is MySQL Cluster, which implements strong consistency (real-time synchronization) through data nodes built by NDB Cluster. At the SQL Server level, multiple masters are built to avoid single points of failure. In this sense, MySQL becomes a distributed system in the “true” sense.

The picture below is the general architecture description given by the official.

In the previous topic, we pointed out a problem in asynchronous synchronization mode, which is the data collision caused by the synchronization of multiple masters individually processing transactions. In the next article, we will expand on this topic by talking about how the entire system can reach a consensus to avoid post-commit consistency issues before a node commits a transaction.

summary

We took a look at the main centralized deployment patterns of MySQL, from the original singleton, to one master and many slaves, to multiple master synchronization, and finally to a distributed cluster solution. I have summarised the key issues in each model, roughly subjectively.

Singleton deployment -> One master with multiple slaves

To solve the problem of excessive single point I/O pressure, the upper limit of the system load is increased by combining the client read/write separation solution. However, the asynchronous synchronization scheme introduces the problem of system consistency. At the same time, if the Master node has a single point of failure, the whole system will still be unavailable for a period of time.

One master with multiple slave -> Multiple master synchronization

In order to solve the single point of failure of Master nodes in one Master multi-slave mode, multiple Master nodes are introduced. You can use other HA schemes to enable multiple Master nodes to provide services simultaneously. However, in the default asynchronous synchronization mode, each Master node submits transaction operation data according to its own state. Once network partitioning occurs, multiple masters may maintain a separate copy of data, causing data inconsistency.

Multi-master synchronization -> Distributed cluster

In order to solve the problem that multiple Master nodes cannot reach a consensus when they operate data, an algorithm is introduced to make each Master reach a consensus with other nodes in the entire network before submitting transactions to ensure the consistency of data in the entire network. At the same time, when the network partition occurs, HA or election algorithm can be used to avoid or reduce the problem of brain splitting.

Finally, or to ridicule, rich or really sweet!

Look at the diagram above and compare it with Oracle’s scheme.

  • RAC is used to avoid single point of failure, redundant instances in the cluster, improve A;
  • SAN to solve the problem of data consistency between multiple instances, improve C.

Finally, the software of OGG and ADG is assisted to solve the problem of data synchronization across data systems and network partitions. A set of programs down in addition to expensive, not good expansion, seems to have no problems. The application developer only needs to care about writing good code, and the architect only needs to care about putting the budget of the solution in order. Don’t worry too much about anything else.

Open source solutions are cheap, scalable, and transparent, but the only downside is that they are probably a bit hair-consuming.