Earlier versions of this tutorial were created byJustin EllingwoodTo write.

Introduction to the

MongoDB, also known as _Mongo_, is a document-oriented database used in many modern networking applications. It is classified as a NoSQL database because it does not rely on traditional table-based relational database structures. Instead, it uses jSON-like documents with dynamic schemas. This means that unlike relational databases, MongoDB does not require a predefined schema before adding data to the database.

When working with databases, it is often useful to have multiple copies of data. This provides redundancy in the event of a failure of one of the database servers, improves database availability and scalability, and reduces read latency. The practice of synchronizing data across multiple independent databases is known as _ copy _. In MongoDB, a set of servers that maintain the same data set through replication is called _ replication set _.

This tutorial provides a brief introduction to how replication works in MongoDB and Outlines how to configure and start a replication set with three members. In this example configuration, each member of the replicated set will be a separate MongoDB instance, running separately on an Ubuntu 20.04 server.

Note: Please note that the program outlined in this guide is intended to demonstrate how to quickly get a copy up and running. After completing this tutorial, you will have a working replica set, but it will not have any security features enabled. This setting is not recommended for production environments.

The community version of MongoDB comes with two authentication methods to help keep databases secure, namely _ key file authentication _ and _X.509 authentication _. For production deployments with replication, the MongoDB documentation recommends x.509 authentication, which describes key files as a “minimal form of security” that is “best suited for test or development environments.” However, the process of obtaining and configuring X.509 certificates has many considerations and decisions that must be made on a case-by-case basis, which is beyond the scope of the DigitalOcean tutorial.

If you plan to use your replica set for testing or development, we strongly recommend you follow our tutorial on how to configure key file authentication for MongoDB replica sets on Ubuntu 20.04.

A prerequisite for

To complete this guide, you will need.

  • Three servers, one running Ubuntu 20.04. All three servers should have a non-root admin user and a firewall configured with the UFW. To set this up, follow our Initial server setup guide for Ubuntu 20.04.
  • Install MongoDB on each of your Ubuntu servers. To do this, follow our tutorial on how to install MongoDB on Ubuntu 20.04, making sure to complete each step on your three servers.

Note that for clarity, this guide will refer to the three servers as Mongo0, Mongo1, and Mongo2. Any examples showing commands or file changes executed on Mongo0 will have a blue background, like this.

Commands and file changes performed on Mongo1 will have a pink background.

The example executing on Mongo2 will have a green background.

Finally, the commands that must be run or file changes that must be made on each server will have a standard gray background, like this.

Learn about MongoDB replica sets

As mentioned in the introduction, MongoDB handles replication through an implementation called _ replicate set _. Each running instance of MongoDB that belongs to a replica set is called one of its _ members _. _ Each replicate set must have one _ primary _ member and at least one _ secondary _ member.

The primary member is the primary access point for transactions with replica sets and is the only member that can accept write operations. Each replication set can have only one primary member at a time, because replication occurs by copying the OPL_OG (short for _” operation log “) of the primary member and repeating the changes recorded on the respective datasets of the second member. Multiple masters accepting writes can result in data conflicts.

By default, the application queries only the read and write operations of the master member. You can configure your Settings to read from one or more secondary members, but since data is transferred asynchronously, reading from secondary nodes may result in old data being served. Therefore, such a configuration is not an ideal choice for every use case.

One feature that distinguishes MongoDB’s replication set from other replication implementations is its automatic failover mechanism. In the case that the primary member is not available, an automatic election process occurs between the secondary nodes to select a new primary member. A replicated set can have a maximum of 50 members, but a maximum of 7 members can vote in elections.

However, if the secondary member pool contains an even number of nodes, it may result in the inability to elect a new primary member due to voting deadlock. This requires the inclusion of a third type of member in the replica set: arbitrators. An arbiter is an optional member of the replica set, voting in this case to ensure that the replica set reaches a decision. Note, however, that arbitrators do not have copies of the data set and are prohibited from being primary members of the replicated set. If a replication set has only one secondary member, an arbitrator is required.

In some cases, you may not want all sub-members to follow the standard rules for sub-members of a replica set. MongoDB allows you to configure sub-members of a replica set to take on the following non-standard roles.

  • Replication member whose priority is 0. In some cases, selecting certain collection members as the primary location can negatively impact your application’s performance. For example, if you are copying data to a remote data center, or a sub-member does not have enough hardware to make it the primary access point for the set, set its priority to0Can ensure that the member does not become the primary member, but can continue to replicate data.
  • Hidden copy member. There are cases where you need to keep a set of members accessible and visible to customers, while hiding background members that have a separate purpose and should not be used for reading operations. For example, you might need a secondary member as the basis for an analytical effort, which would benefit from an up-to-date data set, but would stress the working member. By making this member hidden, it will not interfere with the general operation of the replicated set. Hidden members must be set to priority0To avoid becoming a majority member, but they can vote in elections.
  • Members that are deferred for replication. By setting the delay option for the secondary member, you can control how long the secondary member waits to perform each action it copies from the primary member’s OPLOG. This is useful if you want to prevent accidental deletions or recover from destructive operations. For example, if you delay a secondary member by half a day, it will not immediately do unexpected operations on its own data set that can be used to recover changes. Delayed members cannot become principal members, but may vote in elections. In most cases, they should also be hidden to prevent applications from reading outdated data.

Step 1 – Configure DNS resolution

When initializing a replica set in Step 4, you need to provide an address where members of each replica set can be contacted by members of the other two replicas. The MongoDB documentation recommends not using IP addresses when configuring replica sets because IP addresses can change unexpectedly. Instead, MongoDB recommends using logical DNS host names when configuring replica sets.

One approach is to configure subdomains for each replicated member. Although configuring subdomains is ideal for production environments or other long-term solutions, this tutorial Outlines how to configure DNS resolution by editing the respective hosts files for each server.

Hosts is a special file that allows you to assign human-readable host names to numeric IP addresses. This means that if the IP addresses of any of your servers change, you only need to update the hosts files on all three servers, rather than reconfigure the replica set.

On Linux and other Unix-like systems, hosts is stored in the /etc/directory. Edit the file with your favorite text editor on your three servers. Here, we’re going to use the Nano.

sudo nano /etc/hosts
Copy the code

After configuring the first few lines of the localhost, add an entry for each member of the replicated set. These entries take the form of an IP address followed by a readable name of your choice, as in this case.

/etc/hosts

IP_address   any_hostname
Copy the code

You can configure your server as any hostname you want, but it may be helpful to make each hostname descriptive. In the example of this guide, three servers will use these host names.

  • mongo0.replset.member
  • mongo1.replset.member
  • mongo2.replset.member

With these host names, your /etc/hosts file will look similar to the highlighted lines below.

/etc/hosts

127.0.0.1 localhost 203.0.113.0 mongo0.replset. Member 203.0.113.1 mongo1.replset. Member 203.0.113.2 mongo2.replset.member . . .Copy the code

If you don’t know the IP addresses of your servers, you can run the following curl command on each server to retrieve them. Icanhazip.com is a web site that displays the IP address of any computer used to access it. By providing its URL as an argument to the curl command, the command will print to standard output the IP address of the server on which you are running it.

curl -4 icanhazip.com
Copy the code

If you use DigitalOcean Droplets, you can also find the IP address of your server in the control panel.

The new line you add here should be the same on all three of your hosts. Save and close files on each of your servers. If you use Nano to edit these files, press CTRL + X, Y, and then ENTER.

After editing, saving, and closing the hosts files on each server, you are finished configuring DNS resolution for the replicated set. You can now proceed to update the firewall rules for each server to allow them to communicate with each other.

Step 2 – Update the firewall configuration for each server with the UFW

Assuming you follow prerequisites’ initial server setup guide, you’ll set up a firewall on each MongoDB installed server and enable access to the OpenSSH UFW profile. This is an important security measure because these firewalls currently block connections to any port on your server except SSH, and these connections propose keys that match those in each server’s respective authorized_keys file.

However, these firewalls also prevent the MongoDB instances on each server from communicating with each other, preventing you from starting a replica set. To correct this, you need to add new firewall rules that allow each server to access ports on the other two servers for which MongoDB is listening.

On Mongo0, run the following ufw command to allow Mongo1 to access port 27017 on Mongo0.

sudo ufw allow from mongo1_server_ip to any port 27017
Copy the code

Be sure to change mogno1_server_IP to reflect your mongo1 server’s actual IP address. Note that the Ufw command will not work with the hostname configured in the hosts file, so be sure to use your server’s actual IP address in this command and in the following commands. Also, if you have updated your Mongo instance on that server to use a non-default port, be sure to change 27017 to reflect the port your MongoDB instance actually uses.

Then add another firewall rule to give Mongo2 access to the same port.

sudo ufw allow from mongo2_server_ip to any port 27017
Copy the code

Next, update the firewall rules for the other two servers. Run the following command on Mongo1 to ensure that the IP addresses are changed to reflect the IP addresses of Mongo0 and Mongo2 respectively.

sudo ufw allow from mongo0_server_ip to any port 27017
sudo ufw allow from mongo2_server_ip to any port 27017
Copy the code

Finally, run these two commands on Mongo2. Also, make sure you enter the correct IP address for each server.

sudo ufw allow from mongo0_server_ip to any port 27017
sudo ufw allow from mongo1_server_ip to any port 27017
Copy the code

After adding these UFW rules, each of your three MongoDB servers will be allowed access to the ports used by MongoDB on the other two servers. However, you can’t test this yet, because Mongo instances on each server currently block any external connections. After enabling replication by updating the configuration files for each MongoDB instance in the next step, you can perform this test.

Step 3 – Enable replication in the MongoDB configuration file for each server

At this point, you have edited your server’s /etc/hosts file to configure the hostname, which resolves to the IP address of each server. You also turned on the firewall for each server, allowing two other servers to access the default MongoDB port, 27107. You are now ready to configure the MongoDB installation on each server to enable replication.

This step Outlines how to do this by editing MongoDB’s configuration file (/etc/mongod.conf). You must complete each program in this step on each server, but for demonstration purposes, we’ll use Mongo0 in our example.

On Mongo0, open the MongoDB configuration file with your favorite text editor.

sudo nano /etc/mongod.conf
Copy the code

MongoDB is currently tied to 127.0.0.1, the local loopback network interface, although you have each server’s firewall turned on to allow other servers to access port 27017. This means that MongoDB can only accept connections from the server on which it is installed.

To allow remote connections, in addition to 127.0.0.1, you must bind MongoDB to your server’s publicly routable IP address. This way, your MongoDB installation will be able to listen for connections from remote machines to your MongoDB server.

Find the Network Interfaces section. By default, it looks like this.

/etc/mongod.conf

.# network interfacesNet: port: 27017 bindIp: 127.0.0.1..Copy the code

Add a comma to the line starting with bindIp:, followed by the mongo0 host name or public IP address. In this example, the host name configured in Step 1 is used.

/etc/mongod.conf

.# network interfacesNet: port: 27017 bindIp: 127.0.0.1, mongo0 replset. Member...Copy the code

Next, find the line at the bottom of the file that says #replication:. It’s going to look something like this.

/etc/mongod.conf

.#replication:.Copy the code

Remove the pound sign (#) from this line and uncomment it. Then add a replSetName directive below this line, followed by the name that MongoDB uses to identify the replica set.

/etc/mongod.conf

. . .
replication:
  replSetName: "rs0".Copy the code

In this example, the value of the replSetName directive is “rs0”. You can provide any name you want here, but using a descriptive name may help. Keep in mind, however, that each server’s mongod.conf file must have the same name after the replSetName directive so that their MongoDB instances become members of the same replica set.

Note that there are two Spaces before the replSetName directive and that the name is enclosed in quotes (“), both of which are necessary for the configuration to be read correctly.

After updating these two parts of the file, NET and Replication, they will look like this.

/etc/mongod.conf

.# network interfacesNet: port: 27017 bindIp: 127.0.0.1, mongo0. Replset) member..) replication: replSetName:"rs0".Copy the code

Save and close the file. Then do the same for /etc/mongod. Conf files on mongo1 and mongo2. After doing this, in the Mongo1 configuration file, these updated parts will look like this.

/etc/mongod.conf

.# network interfacesNet: port: 27017 bindIp: 127.0.0.1, mongo1. Replset) member..) replication: replSetName:"rs0".Copy the code

Here’s what these sections look like in the Mongo2 configuration file.

/etc/mongod.conf

.# network interfacesNet: port: 27017 bindIp: 127.0.0.1, mongo2. Replset) member..) replication: replSetName:"rs0".Copy the code

To reiterate, the IP address or host name you add to each server’s bindIp directive must be the mongod. Conf file of the server you are editing.

After making these changes to the mongod. Conf file for each server, save and close each file. Then, restart the Mongod service on each server by issuing the following command.

sudo systemctl restart mongod
Copy the code

In this way, you enable replication for each MongoDB instance on each server.

Note: At this point, you can use the NC command to test whether the firewall rules you added in Step 2 are correct. Nc, short for _netcat_, is a tool for establishing TCP or UDP network connections. In this case, it is useful for testing because it allows you to specify both an IP address and a port number when establishing a connection.

The following example nc command includes the -z option, which restricts the tool to only scan a listening daemon on the target server without sending any data to it. Reviewing the prerequisites installation tutorial, MongoDB runs as a service daemon, so this option is useful for testing connectivity. It also includes the V option, which increases the coarser nature of the command, causing it to return more information than otherwise.

This example, NC, shows the case of trying to go from Mongo0 to Mongo1.

nc -zv mongo1.replset.member 27017
Copy the code

The following output shows that Mongo0 can reach Mongo1 on the same port MongoDB uses.

OutputConnection to mongo1.replset.member 27017 port [tcp/*] succeeded!
Copy the code

You can test the connection between each pair of servers by repeating this command on each server and specifying the appropriate hostname or IP address.

After editing the mongod. Conf file for each server to enable replication and restart the Mongod service, you can start the replication set and add each Mongo instance as a member.

Step 4 – Start the replicate set and add members

Now that you have three MongoDB installations configured, you can open the MongoDB shell to start replication and add each member as a member.

For demonstration purposes, the example in this step will use the MongoDB instance on Mongo0 to start the replicated set. However, you can start replication from any server whose mongod. Conf file has been properly configured.

On Mongo0, open MongoDB shell.

mongo
Copy the code

From the prompt, you can launch a replica set from the Mongo shell by running the Rs.Initiate () method. However, running the method itself will only start replication for the machine on which you are running the method, and then you need to add your other Mongo instances by publishing the rs.add() method for each member.

To recap, MongoDB stores its data in jSON-like structures called _ documents _. Because you have edited the Mongod. Conf file on each of your servers to configure replication of the three Mongo instances, you can include a document in the Rs.Initiate method that holds configuration details for each member. This will allow you to start the replica set and add each member at once, rather than running multiple separate methods.

To do this, start an Rs.Initiate () method by typing the following and pressing ENTER.

rs.initiate(
Copy the code

Mongo will not register the RS.Initiate method as finished until you enter the closing parentheses. Before you do this, the hint will change from a greater-than sign (>) to an ellipsis (…). .

Like objects in JSON, documents in MongoDB start and end with curly braces ({and}). To begin adding the configuration document for the replica set, enter a leading brace.

{
Copy the code

MongoDB documents are composed of any number of _ fields and value _ pairs of the form field: value. The first field and value pair for this particular document must be an _ID: field, which provides a name to identify the replica set; The value of this field must be the same as the replSetName directive you set in the Mongod. Conf file, in our case “rs0”.

ENTER the field and value pair, followed by a comma, and then press ENTER to start a new line.

_id: "rs0".Copy the code

Next, add a Members: field. However, after the Members: field, you replace a single value with an array of multiple documents, each representing a replicated set member to be added. In MongoDB documentation, arrays are always enclosed in square brackets ([and]).

Add the Members: field, followed by a leading square bracket to start the array, and then press ENTER to move to the next line.

members: [
Copy the code

Now add a document with two field and value pairs, separated by commas, representing the first member of the replicated set. The first field in this document is another _ID: field, which accepts an integer for internal identification of members. The second is the host: field, which must be followed by a string containing the host name, which is resolved to an address that can reach the member Mongo instance.

{ _id: 0, host: "mongo0.replset.member" },
Copy the code

Note: If any of your Mongo instances are running on MongoDB’s default port 27017, you must follow the host name with a colon (:) followed by the port number, as in this example.

{ _id: 0, host: "mongo0.replset.member:27018" },
Copy the code

After entering the first file, enter other files for the other members of the replicated set. Make sure that each file is separated by a comma.

{ _id: 1, host: "mongo1.replset.member" },
{ _id: 2, host: "mongo2.replset.member" }
Copy the code

Next, end the array by entering a square bracket.

]
Copy the code

Finally, the configuration file is closed with a closing brace, followed by the method with a closing brace.

})
Copy the code

All of this, the Rs.Initiate () method will look like this.

> rs.initiate( ... {... _id: "rs0", ... members: [ ... { _id: 0, host: "mongo0.replset.member" }, ... { _id: 1, host: "mongo1.replset.member" }, ... { _id: 2, host: "mongo2.replset.member" } ... ] . })Copy the code

Assuming you entered all the details correctly, once you press ENTER after typing the closing parentheses, the method will run and start the copy set. If the method returns “OK” : 1 in the output, it means that the replication set was started correctly.

Output{
    "ok" : 1,
    "$clusterTime" : {
        "clusterTime" : Timestamp(1612389071, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    },
    "operationTime" : Timestamp(1612389071, 1)
}
Copy the code

If the replica set starts as expected, you’ll notice that the MongoDB client prompts will change from just a greater-than sign (>) to the following.

MongoDB installs built-in methods that you can use to manage and retrieve information about replicated sets. Of these methods, the rs.help() method is particularly helpful because it returns a list of these replicate set methods and a description of what they do.

rs.help()
Copy the code
Output rs.status() { replSetGetStatus : 1 } checks repl set status rs.initiate() { replSetInitiate : null } initiates set with default settings rs.initiate(cfg) { replSetInitiate : cfg } initiates set with configuration cfg rs.conf() get the current configuration object from local.system.replset rs.reconfig(cfg) updates the configuration of a running replica set with cfg (disconnects) rs.add(hostportstr) add a new  member to the set with default attributes (disconnects) rs.add(membercfgobj) add a new member to the set with extra attributes (disconnects) rs.addArb(hostportstr) add a new member which is arbiterOnly:true (disconnects) rs.stepDown([stepdownSecs, catchUpSecs]) step down as primary (disconnects) rs.syncFrom(hostportstr) make a secondary sync from the given member rs.freeze(secs) make a node ineligible to become primary for the time specified rs.remove(hostportstr) remove a host from the replica set (disconnects) rs.secondaryOk() allow queries on secondary nodes rs.printReplicationInfo() check oplog size and time range rs.printSecondaryReplicationInfo() check replica set members and replication lag db.isMaster()  check who is primary db.hello() check who is primary reconfiguration helpers disconnect from the database so the shell will display an error, even if the command succeeds.Copy the code

After running rs.help() or another of these methods, you might see the client prompt change to the following again.

This means that the MongoDB instance you are connecting to is selected as a master set member.

Note that if you have additional nodes that you want to add to the replica set in the future, you can use rs.add() after configuring them, just as you configured the current replica set members in the previous step.

rs.add( "mongo3.replset.member" )
Copy the code

You can now shut down the MongoDB client by pressing CTRL + C or running the exit command.

exit
Copy the code

Your replica set is now up and running, and you can start integrating it with your application.

Warning. You may have noticed a warning similar to this when you open the MongoDB prompt to start a replica set.

... 2021-02-03 T21:45:48. 379 + 00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted . . .Copy the code

This message indicates that you have not enabled access control for your database. According to MongoDB documentation.

MongoDB uses role-based access Control (RBAC) to manage access to MongoDB systems. A user is granted one or more roles that determine the user’s access to database resources and operations.

Because access control is not enabled on any of your MongoDB instances, anyone who has access to the three servers in the replication set can also gain access to the Mongo instance on that server. This poses a significant security risk because it means they can also gain access to your application data.

The way to eliminate this warning and add another layer of security to your replica set is to configure _ key file authentication _. As mentioned in the introduction, the MongoDB documentation describes key files as a “minimal form of security” that is “best suited for testing or development environments.”

Note that for production deployments, the MongoDB documentation instead recommends using X.509 certificates for internal member authentication. The process of obtaining and configuring X.509 certificates has many considerations and decisions that must be made on a case-by-case basis that are beyond the scope of this tutorial.

If you plan to use replica sets for testing or development, we strongly recommend that you follow our tutorial on how to configure key file authentication for MongoDB replica sets on Ubuntu 20.04.

conclusion

Database replication has become so widely used as a strategy to improve performance, availability, and data security that it is recommended to enable some form of replication for any database used in a production environment. Replication is also generic and can play many different roles in a data architecture, such as reporting or disaster recovery. The automatic failover feature found in MongoDB’s replicate set makes it especially valuable, helping to ensure that your data remains highly available in the event of a failure.

If you want to learn more about MongoDB, we encourage you to check out our entire MongoDB tutorial collection.