This article translated from mongo website document: docs.mongodb.com/v4.4/replic… , with a slight modification and an additional example.

An overview of the

Replica set of MongoDB is a group of Mongod processes that maintain the same data set. Replica sets ensure redundancy and high availability, which are basically the basic requirements of a production environment. This article describes replica sets, components, and their architecture.

Redundancy and data availability

Duplicates provide redundancy and increase data availability. With multiple copies of data from different database servers, the replica set is more fault-tolerant than a single database server.

In some cases, because clients can send read requests to different servers, duplicates can increase the number of concurrent reads. Maintaining data copies in different data centers is important for distributed applications. Enables remote data backup and improved availability. Maintaining additional copies can also be used for purposes such as disaster recovery, reporting, or backup.

Mongo’s copy

A replica set is essentially a group of Mongod processes that maintain the same data set. A replica set consists of multiple data fault-tolerant nodes and an optional quorum node. Among data fault tolerant nodes, one and only one node is considered the master node, while the others are the slave nodes.

The primary node receives all write operations. A replica set can have only one primary node confirming a write with the {w: “majority”} instruction. In some cases, another Mongod process may consider itself the master node for a short period of time (there may be two nodes competing for the master node, but only the one that completes the write operation is the actual master node). The primary node records all operations on the dataset in an operation log.

The slave node replicates the master node’s operation log and performs the same operation on their data set, thus maintaining consistency with the master node’s data set. If the master node is not available, a qualified slave node will hold an election to promote itself to the new master node.

In some cases (such as when you have a master node and a slave node, but can’t add any more slave nodes due to cost), we choose a Mongod instance as the arbiter. The arbiter only participates in the election and does not hold the data. Under normal circumstances. The role of arbiter will not change. However, when the primary node goes offline, it becomes a secondary node to participate in voting (only participating in voting), which affects the voting result and elects another secondary node as the primary node. This is usually used when the number of examples is even, in which case there may be a unanimous vote, which requires an arbiter to break the balance.

Asynchronous replication

The slave node asynchronously replicates the operation logs from the master node and applies the operations to its own data set. Because the secondary node retains the same set of data as the primary node, the replica set continues to function even if one or more nodes fail.

It takes a certain amount of time for the secondary node to replicate from the primary node, resulting in replication delay. Replication latency refers to the time required by the secondary node to replicate write operations from the primary node. Small replication delays are generally acceptable, but increasing replication delays can cause serious emergency problems, including increased cache build pressure on the master node.

Starting with MongoDB 4.2, administrators can limit the frequency of write operations on the primary node to maintain maximum commit delays. This would be delay by flowControlTargetLagSeconds configuration, this is the flow control. Flow control is enabled by default. After open the flow control, time delay once nearly to the flowControlTargetLagSeconds, the master node before getting to write lock must obtain tickets to line up. Flow control attempts to keep the delay below a set target by limiting the number of queued tickets generated per second.

Automatic troubleshooting

When a master node (electionTimeoutMillis, the default is 10 seconds), if you do not communicate with other members, you are considered dead. At this point, a qualified node holds a vote and nominates itself as the new master node. The cluster will attempt to elect a new master node to resume normal operations.During the election, the replica set cannot handle writes, but reads can be done if the query configuration is not running on the slave node. In general, the election of a new primary should take no more than 12 seconds — the default time set by the replica set. This includes marking the master node as unavailable and initiating and completing the selection process. You can modifysettings.electionTimeoutMillisTo change the time. Factors such as network latency can also lengthen the time. Because of these impacts, the cluster may work without a master node.

Turning electionTimeoutMillis down can quickly detect primary node failures, but it can also lead to more frequent primary node elections — especially for occasional network delays that can occur even when the primary node is not down. The application of connection logic should consider tolerating automatic troubleshooting and the subsequent election process. Since version 3.6, MongoDB drivers detect primary node failures and automatically attempt a write, adding additional built-in automatic fault and election handling. In version 4.2 and later, this feature is automatically enabled, while in version 3.6 to 4.0, retryWrites=true is required in the connection character attribute.

Since version 4.4, MongoDB has provided mirror reads to warm up participating slave nodes to cache recently accessed data. Warming up the cache of the slave node can speed up the recovery after the election.

Creating replica sets

The format for creating a replica set is as follows:

mongod --replSet "rs0" --bind_ip localhost,<hostname(s)|ip address(es)> --port <port>
Copy the code

You can also start using a configuration file:

replication:
   replSetName: "rs0"
net:
   bindIp: localhost,<hostname(s)|ip address(es)>
Copy the code

If bind_IP is not specified, the default value is the local host. If port is not specified, the default value is 27017. For example, create three nodes on the local machine:

# Node 1 (the first node will be set as the primary node)
mongod --replSet s0 --dbpath mongodb --port 27017 --oplogSize 100
2 # node
mongod --replSet s0 --dbpath mongodb1 --port 27018 --oplogSize 100
3 # node
mongod --replSet s0 --dbpath mongodb2 --port 27018 --oplogSize 100
Copy the code

Client connection

You can use mongo –port to connect to a specific instance, which will indicate whether the primary or secondary node is currently connected.

# Primary node client
s0: PRIMARY>
# Secondary node client
s0:SECONDARY>
Copy the code

If we perform a write operation on a secondary node, it will prompt: “NotWritablePrimary”. Note that the secondary node is not readable by default and will be read with an error: NotPrimaryNoSecondaryOk. If the read function is temporarily enabled, it needs to be performed on the secondary node’s client (earlier version is rs.slaveok ()) :

rs.secondaryOk();
# or the following command
db.getMongo().setSecondaryOk();
Copy the code

Monitoring node status:

var config = {"_id":"s0"."members": []}; config.members.push({"_id": 0."host": "localhost:27017"});
config.members.push({"_id": 1, "host": "localhost:27018"});
config.members.push({"_id": 2."host": "localhost:27019"});
rs.initiate(config);
Check node status
rs.status();
Copy the code

conclusion

This article introduced the basic concepts and features of MongoDB replica sets and demonstrated how to create replicas and how clients connect to multiple nodes.