[toc]
Original is not easy, like to pay attention to me. Public number: Qiya cloud storage
Directing a background
MongoDB is a complete functional distributed document database, is a very famous NoSQL database. At present, there are more and more large-scale practices using Mongodb in China. Mongodb provides our company with important database storage services, supporting nearly ten million peak QPS reads and writes and trillions of data storage services every day.
MongoDB has great advantages in terms of high performance, dynamic scaling, high availability, easy deployment, easy use, and massive data storage. In recent years, MongoDB has consistently ranked Top5 in popularity of db-engines and its scores have been increasing over the years, as shown in the following figure:
Db-engines is a website that ranks the popularity of database management systems.
Ranking score:
MongoDB is the only non-relational database in Top5. Today we are going to look at some of MongoDB’s highly available architectures from a higher level. By looking at these architectures we can even get a sense of the evolution of the general distributed architecture.
Highly available architecture
High Availability (HA) Shortening the downtime caused by normal operation and maintenance (O&M) or unexpected faults and improving system Availability.
So the question comes, all say their service high availability, high availability can be measured? Is there any competition?
Yes, here comes the concept of an SLA. SLA stands for Service Level Agreement. An SLA is an agreement that quantifies availability. It is defined as a mutually agreed agreement between a service provider and a user on mutually agreed premises. SLA is an important indicator to determine service quality.
So the question is, how is SLA quantified? In fact, it is calculated according to the time of withdrawal. How did it work out? Here’s an example:
1 year = 365 days = 8760 hours 99.9 Stop time: 8760 * 0.1% = 8760 * 0.001 = 8.76 hours 99.999 Stop time: 8760 * 0.00001 = 0.0876 hours = 5.26 minutesCopy the code
In other words, if a public cloud vendor provides the object storage service and the SLA agreement specifies five high-availability services of 9, it must ensure that the object storage service stop time is less than 5.26 minutes within one year. If the stop time is longer than 5.26 minutes, it is a violation of the SLA agreement and the public cloud can claim compensation.
Back to the topic of high availability, the plain language is that no matter what happens, you can’t affect the hosted business. That’s high availability.
As mentioned earlier, both high reliability of data and high availability of components are one solution: redundancy. We provide consistent and uninterrupted service externally through multiple components and backups. Redundancy is fundamental, but how redundancy is used varies.
Following we can summarize several specific modes of MongoDB according to different redundancy processing strategies, which is also a universal architecture and common in other distributed systems.
We introduce three high availability modes of Mongo one by one. These three modes also represent the evolution history of high availability architecture in general distributed system, namely master-slave, Replica Set and Sharding mode.
The Master – Slave mode
The first redundancy policy provided by Mongodb is the master-slave policy, which is also the initial redundancy policy of the distributed system. It is a hot spare policy.
The master-slave architecture is generally used for backup or read-write separation. It is usually a master-slave design or a master-slave design.
Master-slave Consists of the following roles:
Master (main)
It is readable and writable and synchronizes Oplog to all connected salves when data changes.
Slave (from)
Read-only. All slaves synchronize data from the Master. The Slave nodes are not aware of each other.
As shown in figure:
From the diagram above, this is a typical fan-shaped structure.
Master-slave thinking on the separation of reading and writing
The Master provides read and write services. If multiple Slave nodes exist, the Slave node can be used to provide read services.
Think, what’s wrong with this read-write separation?
There is one insurmountable problem: data inconsistency. The fundamental reason is that only Master nodes can write data, while Slave nodes can only synchronize Master data and provide read services. Therefore, you will find that this process is asynchronous.
Although the data will eventually be synchronized to the Slave, the data will not be consistent until the data is completely consistent, in which case the Slave node will read the old data. Therefore, to sum up: the structure of read/write separation is only suitable for specific scenarios, and is not suitable for scenarios where strong data consistency is required.
Master-slave Disaster recovery thinking
When the Master node fails, the Slave node has backup data, which is easy to handle. As long as the data is still there, there is an account for the user. If the Master fails, the Slave node can be manually specified as the Master node by manually checking and performing operations. In this way, the Slave node can provide external services again.
What are the characteristics of this model?
- Master-slave identifies only two roles: Master node and Slave node.
- The role of the master-slave is statically configured and cannot be automatically changed.
- Users can only write data to the Master node, and Slave nodes can only pull data from the Master node.
- There is another key point: The Slave nodes only communicate with the Master, and the slaves are not aware of each other. This advantage is very light for the Master. The disadvantage is that: there is obviously a single point in the system, so many slaves can only pull data from the Master, but cannot provide their own judgment.
What are the problems with these features?
The biggest first problem is poor usability. Because it’s easy to understand, because when the master node fails, it has to be handled manually, so this is a huge stop window;
The present situation of the Master – Slave
The master-slave mode has been deprecated since MongoDB 3.6 and master-slave replication has been deprecated for sharded cluster components since MongoDB 3.2. In master-slave mode, the Master cannot be recovered automatically after the Master breaks down. Therefore, manual operation is required and the reliability is poor. Improper operation may cause data loss.
How to set up master-slave mode?
Start the Master node:
mongod --master --dbpath /data/masterdb/
Copy the code
Key parameters:
--master
: Specifies the Master role.
Start the Slave node:
mongod --slave --source <masterhostname><:<port>> --dbpath /data/slavedb/
Copy the code
Key parameters:
--slave
: Specifies the Slave role.--source
: specifies the source of the data to be copied, i.e. the address of the Master.
Replica Set Replica Set mode
Replica Set mode role
Replica Set is a collection of mongod instances, consisting of three types of node roles:
Primary (Primary node)
Only the Primary is readable and writable. The Primary receives all write requests and then synchronizes data to all Secondary. A Replica Set has only one Primary node. When the Primary fails, other Secondary or Arbiter nodes will re-elect a Primary node, so that services can be provided again.
By default, the read request is sent to the Primary node for processing. If it needs to be deliberately forwarded to Secondary, the client needs to modify the configuration (note: it is the client configuration, and the decision-making power is in the client).
There is also a single point problem with Primary and Secondary node roles.
The biggest difference between this mode and the master-slave mode is that the Primary role is jointly elected by the whole cluster, and everyone can become a Primary. Everyone is only Secondary at the beginning, and this election process is completely automatic without human participation.
Secondary (copy node)
Data replica nodes, which participate in the primary election when the primary node fails.
Consider a question: What is the difference between Slave roles in Secondary and master-slave mode?
The most fundamental difference is that Secondary has heartbeat with each other. Secondary can be used as data source, while Replica can be a chain replication mode.
Arbiter
No data, no votes, only votes. The use of Arbiter can reduce the redundancy of data backup, and can provide high availability.
The diagram below:
Replica set pattern features thinking
The Replica Set model of MongoDB mainly has the following characteristics:
- Multiple data copies can be used to restore services in the event of a fault. Note: this is automatic fault recovery;
- Read/write separation. Read requests are distributed to copies to reduce the read pressure on the Primary.
- The nodes directly communicate with each other to sense the overall status of the cluster.
Consider: What are the pros and cons of this?
Availability is greatly enhanced because it automatically recovers when the Primary node fails and a new Primary node is immediately selected. However, there is one point to be noted: every two nodes have heartbeats. This mode will increase the geometric multiple of heartbeats of nodes. The size of a single Replica Set cluster should not be too large, generally speaking, the maximum number of nodes should not exceed 50.
Ponder: is the node number particular?
Yes, it’s important to have an odd number of voting nodes. Why? Because an even number of votes can result in split brains, i.e. a split number of votes.
For example, if there are 3 votes, it must be 2:1, one person must be in the majority, if there are 4 votes, it’s more likely to be 2:2, so there’s a tie.
Sharding pattern
According to the principle, the Replica Set model has solved the usability problem very well. Why will it evolve further? Because in today’s big data era, there is a problem that must be considered: the amount of data.
The amount of user data is always increasing. Theoretically, there is no upper limit, but Replica Set has an upper limit. How do you say?
For example, let’s say your single machine has 10 TiBs of space, 500 GIBs of MEMORY, and 40 GIGABytes of network cards. This is the physical limit of the single machine. When the data volume exceeds 10 TiB, the Replica Set cannot provide service. You might say, well, let’s add disk, let’s make the disk bigger. Yes, but there must be a physical limit to the capacity and performance of a single machine (for example, your disk slot may be up to 60 disks). What if there is a bottleneck in a single machine?
The solution: use distributed technology.
Addressing performance and capacity bottlenecks Generally there are two directions for optimization:
- Longitudinal optimization
- Horizontal optimization
Vertical optimization is the most common idea in traditional enterprises, continuously increasing the capacity and performance of individual disks and machines. CPU frequency continues to improve, the number of cores also continue to increase, disk capacity from 128 GiB to today’s common 12 TiB, memory capacity from the previous M level into hundreds of G now. Bandwidth has changed from 100 mbit/s to the current 10 mbit/s, but these improvements cannot keep up with the increase in the scale of data used over the Internet.
Horizontal optimization is to add nodes and expand the capacity horizontally to solve problems. Business to divide the system data set, and processing in a number of servers, so that the capacity and capacity is proportional to the number of machines. The overall speed or capacity of a single machine may not be high, but each machine can only handle a portion of the total workload, and thus may provide greater efficiency than a single high-speed, high-volume server.
The expanded capacity only requires the addition of additional servers as needed, which is less than the cost of a high-end hardware machine, at the cost of supporting the software infrastructure and complex deployment and maintenance.
So, in practice, which is more feasible?
Nature is the solution of distributed technology, vertical optimization is very easy to reach the physical limit, horizontal optimization is not high on the individual, but the group to play the effect (but put forward higher requirements for software architecture).
In 2003, Google released the Google File System paper, which is an extensible distributed File System for large, distributed applications that access large amounts of data. It runs on inexpensive, generic hardware and provides distributed fault tolerance. GFS officially opens the door to the application of distributed technology.
Sharding mode of MongoDB is an architecture implementation of horizontal expansion of MongoDB. Let’s look at the differences between the Sharding model and the Replica Set model.
Sharding mode role
Sharding mode can be divided into three modules according to the hierarchy:
- Agent layer: Mongos
- Configuration Center: Replica Cluster (Mongod)
- Data layer: Shard cluster
The brief picture is as follows:
The agent layer:
The component of the proxy layer, called Mongos, is a stateless component that is purely a routing function. When a Client receives a write request, data is hashed to a Shard cluster based on a specific algorithm. Then data is written to the Shard cluster. When receiving a read request, locate the Shard on which the object to be read is located, and forward the request to the Shard, then the data can be read.
Data layer:
What is the data layer? It’s where the data is stored. You will be surprised to find that the data layer is actually made up of Replica Set clusters. As we said earlier, there is a limit to a single Replica Set. What should we do? Then Set up multiple Replica sets. Such a Replica Set is called a Shard. Theoretically, the number of Replica sets can grow indefinitely.
Configuration center:
The agent layer is a stateless module, and each Shard of the data layer is independent. There must always be a place for cluster configuration management, which is the configuration center. What’s in it?
For example, how many shards there are, and which nodes each Shard cluster consists of. Roughly how much data is stored in each Shard (for balancing purposes). These things are in the configuration center.
The configuration center stores the cluster topology and manages configuration information. This information is also very important, so it cannot be stored in a single point. How to do? The configuration center is also a Replica Set cluster, and the data is also multi-copy.
Detailed architecture diagram:
How to store data in Sharding mode?
As we said, vertical optimization is the most user-friendly for hardware users, while horizontal optimization puts forward higher requirements for hardware users, that is, software architecture needs to be adapted.
A single Shard cluster is limited, but the number of shards is infinite. Mongo theoretically can provide nearly unlimited space, which can be continuously expanded horizontally. So now the only thing to solve is how to store user data in these shards? What did MongDB do?
First, select a field (or a combination of fields) for the Key, which can be any field you specify. So what we’re going to do now is use this Key to figure out which Shard we’re going to go to by some kind of strategy. This Strategy is called Sharding Strategy.
We take Sharding Key as the input and calculate a value according to the characteristic Sharding Strategy. The set of values forms a range of values. We slice this range according to the fixed step size, and each slice is called Chunk. Each Chunk is bound to a Shard at birth, and the binding relationship is stored in the configuration center.
Therefore, we can see that MongoDB uses Chunk to make another layer of abstraction to isolate the location of user data and Shard. The user data can be calculated on which Chunk first according to the sharding strategy. Since Chunk only belongs to a certain Shard at a certain time, So you know which Shard the user data is stored in.
Data writing process in Sharding mode:
Data reading process in Sharding mode:
We can also see from the above figure that Mongos, as a routing module, is actually a pathfinding component. When writing, it first calculates which Chunk the user key belongs to, then finds out which Shard the Chunk belongs to, and finally sends the request to the Shard, and then the data can be written down. The same is true for reading. You can first figure out which Chunk the user key belongs to, then find out which Shard the Chunk belongs to, and finally send the request to the Shard to read the data.
In practice, Mongos does not need to interact with Config Server every time. In most cases, it only needs to cache the Chunk mapping table in Mongos memory, which can reduce one network interaction and improve performance.
Why is there an extra layer of abstraction called Chunk?
For flexibility, once the user data is mapped directly to the Shard, the user data is bound to the underlying physical location. What if the Shard space is full?
It can’t be stored, and it can’t be stored anywhere else. Some students might think, well, I can record the mapping of this change. In theory, it is possible to record, but each user data records one mapping to the Shard. This magnitude is very large, and it is not feasible in practice.
Now, with an extra layer of Chunk space, it’s flexible. User data is no longer tied to physical locations, but is mapped to chunks only. If the data of a Shard is not balanced, the Chunk space can be split, half of the data can be moved to other shards, and the mapping between Chunk and Shard can be modified. There are few mapping entries between Chunk and Shard, so it can be completely held. And this balancing process is completely unknown to users.
So what is a Sharding Strategy? In essence, Sharding Strategy is just a Strategy to form range. MongoDB supports two Sharding strategies:
- Hashed Sharding’s way
- Range Sharding’s way
Hashed Sharding
The Key is input into a Hash function to calculate an integer value. The set of values forms a range of values, which we divide according to a fixed step size. Each piece is called Chunk, and the Chunk here is just a range of integers.
What are the advantages and disadvantages of this way of calculating the range?
The benefits are:
- Fast calculation speed
- Good equilibrium, pure random
The downside is that:
- Because of pure randomness, sorted enumerations are extremely poor. For example, if you list data by name, you will find that almost all shards are involved.
Range Sharding
The **Range method essentially uses the Key itself as the value, forming the Key Space **.
In the example above, Sharding Key is selected as name. For keys such as “test_0”, “test_1”, and “test_2”, they are all arranged next to each other, so they are all allocated in one Chunk.
The three Docuement pieces are most likely on a Chunk, since we sort them by Name. What are the pros and cons of this approach?
The benefits are:
- It is very friendly for sorting enumeration scenarios, because the data is placed in order on the Shard, when sorting enumeration, the order can be read, very fast;
The downside is that:
- It is easy to cause hot spots. For example, if Sharding keys all have the same prefix, it is highly likely that they will be assigned to the same Shard, so I will stare at this Shard to write, while other shards are idle but unable to help.
Further usability improvements
Why Sharding mode not only solves the capacity problem, but also improves usability?
Because there are so many Shard (Replica Set) clusters, even if one or more shards are unavailable, the Mongo cluster can still provide read and write services. Since each Shard has a Primary node and can provide write services, availability is further improved.
Recommended posture
Master Slave mode is no longer recommended. Relicate Set and Sharding mode can ensure high reliability and availability of data. However, in practice, we found that clients have very large configuration permissions. That said, if users use MongoDB in the wrong poses, they may not meet your expectations.
Use position 1: How to ensure high availability?
If you’re in Replicate Set mode, the client needs to be actively aware of the master/slave switch. When I used the MongoDB client SDK of a certain version of Go language, I found that there was no active awareness during the master/slave switchover. As a result, requests were sent to faulty nodes all the time, leading to service unavailability.
So what do we do with this form? There are two options:
- Sharding mode is used, because in Sharding mode, users deal with Mongos, which is an agent. It helps you to screen the details of the Replica Set at the bottom, and it helps you to complete the master/slave switchover.
- The client is aware of it and refreshes periodically (which is relatively troublesome);
Use posture 2: How to ensure the high reliability of the data?
The client configuration write is successful only when it succeeds. Yes, this permission is assigned to the client. If most data is successfully written without configuration, it is likely that a piece of data is successfully written. At this time, if a fault occurs or the master node is switched off, the data may be lost or rolled back by the master node, which is also equivalent to user data loss.
Mongodb has a complete rollback and write policy mechanism, but it must be used properly. How to ensure high reliability? To be successful is to write many successes.
Use posture 3: How to ensure strong consistency of data?
The client configures two things:
- Write most successful, is successful;
- Read using strong mode, that is, read only from the primary node;
Only these two configurations can ensure absolute user data security and provide strong data consistency externally.
Original is not easy, like to pay attention to me. Public number: Qiya cloud storage
conclusion
- This paper introduces three high-availability architecture of MongoDB, namely master-slave mode, Replica Set mode and Sharding mode, which is also a common process of architecture evolution.
- MongdbDB master-slave is not recommended, or even the new version does not support this redundancy mode.
- Replica Set improves reliability through multiple data copies and component redundancy. Moreover, distributed automatic principal selection algorithm is adopted to reduce the downtime time window and improve availability.
- Sharding mode provides users with nearly unlimited space through horizontal expansion.
- The MongoDB client has a large number of configuration permissions. By specifying the write majority policy and the strong mode (reading data only from the primary node), the MongoDB client ensures high reliability and consistency of data.
Afterword.
Today, we analyze the highly available architecture of MongoDB from a larger level. These architectures are also the most common architecture patterns in distributed systems. MongoDB as the current hot NoSQL database, there are a lot of places worth learning. In the future, I will have the opportunity to make an in-depth analysis from the perspective of principle and practice.
Original is not easy, like to pay attention to me. Public number: Qiya cloud storage