This article translated from mongo website document: docs.mongodb.com/v4.4/replic… , with a slight modification and an additional example.

preface

This paper introduces the basic concepts and examples of MongoDB replication set. It can provide high availability data service by the way of master node write, slave node synchronization and multi-node read. In addition, when the primary MongoDB node breaks down, the secondary node automatically elects a new primary node to resume read and write services. This section describes the read operation of a replica set.

Read Preference

By default, the client reads data from the primary node. But as we demonstrated in the last article, you can specify that the client access reads from the node.

mongo --port <port> --host <host | IP>
Copy the code

However, due to the asynchronous replication of data between the primary and secondary nodes, the client may read different data from the secondary node and the primary node. For multi-document transactions that include read operations, data must be read from the primary node. You cannot operate across nodes for the same transaction. There are three modes of read preference operations: Primary node: All read operations use the current primary node, which is the default mode. If the primary node goes down, the read operation will throw an exception. The master node mode is not compatible with read operations that use tag sets or maxStalenessSeconds.

Master node preference: In this mode, data is read from the master node in most cases. However, if the primary node is not available, the read operation will switch to the secondary node after the maximum effective latency and tag set is satisfied. When the primary node is set first and the maximum effective delay time is set, the client will evaluate the timeliness of each secondary node by the time of the last data write when the primary node is unavailable. The data is then read from a node that is less than or equal to the maximum effective delay.

When the read preference contains a set of tags and the primary node is unavailable, the client view looks for a secondary node that matches the label (matching on demand until one is found based on the label rules). If a matching slave node is found, the client randomly selects one of the closest groups of the matched nodes to read from the slave node. But if it is not found, an error is created.

When both the maximum effective delay time and the label set are included, the client preferentially selects which slave node by timeliness. Read operations that use primary preference may return lagged data. Therefore, using the maximum effective delay time option can avoid reading old data.

Slave node: The client only reads data from the slave node. If no slave node is available, the read operation will produce an error or exception. Most replica sets have at least one slave, but there may be slave nodes that are not available (all slave nodes are down). If the maximum latency is set in this mode, the client will evaluate the timeliness of the primary node writing to the secondary node, and then select the node whose latency is less than or equal to the maximum latency. If there is no primary node, the data will be read from the nearest secondary node.

When the slave node pattern includes a set of labels, the client reads data from a slave node that finds a matching label. If a matching slave node is found, the client randomly selects one of the closest groups of the matched nodes to read from the slave node. But if it is not found, an error is created. Similarly, it is possible to read delayed data from the node mode, and you can set the maximum valid time delay to avoid reading older data.

Slave node first: In this mode, data is read from the slave node most of the time. But if the replicated set has only one master node, the data will be read from the master node. If there are multiple slave nodes available, which slave node to choose is the same as the slave node pattern.

Nearest mode: In this mode, the client does not distinguish between master and slave nodes, but reads data from nodes whose network latency falls within the acceptable delay range. This mode selects nodes with the lowest network latency, so by default there is no preference for timeliness of the data.

If the maximum latency is set, the client will evaluate the timeliness of the primary node writing to the secondary node. If there is a primary node, the client will evaluate the timeliness of the primary node writing to each secondary node. If there is no primary node, the most recently written node is called. From these nodes, nodes that exceed the maximum effective latency are filtered out, and from the remaining nodes (without discriminating between master and slave nodes), nodes whose network latency is below the acceptable range are selected to read data.

If a tag set is specified, the client tries to find a node that matches the tag set, but selects a node from the nearest group to read the data. If both the maximum valid delay time and the tag set are specified, the client uses the timeliness filter node first and then the tag set filter. For the remaining Instance of Mongod, the client randomly picks one of the nodes that meets the network delay to read the data.

It is also possible to read expired data in this mode, and you can reduce the probability of this by setting the maximum effective delay.

Configuring Read Preferences

If you are using a MongoDB driver, you can configure the read preference mode from the driver’s API. When using the Mongo client connection, this can be set using cursor.readpref () and mongo.setreadRef ().

db.getMongo().setReadPref(
   "secondary".// Slave mode[{"datacenter": "B" },    // Match the data center label first
      { "region": "West"},      // If not, match from the area tag{}// If none is found, all candidate nodes are matched using an empty document])Copy the code
db.collection.find({ }).readPref(
   "secondary"[{"datacenter": "B" },    // Match the data center label first
      { "region": "West"}, {}])Copy the code

conclusion

As you can see, MongoDB’s replication set is configured with multiple patterns for read operations. The label mode is especially suitable for distributed storage. This is one of the advantages of MongoDB over other databases, as it provides many alternative ways to configure on a cluster deployment to improve availability and reduce data latency.