An overview,

1. What is MongoDB? Sum it up in one sentence

MongoDB is a database management system designed for Web applications and Internet infrastructure. Yes, MongoDB is a database, NoSQL type database.

2. Why use MongoDB?

(1) MongoDB puts forward the concept of document and collection, and uses BSON (jSON-like) as its data model structure. Its structure is object-oriented rather than two-dimensional table, so it stores a user in MongoDB.

{the username: '123', password: '123'}Copy the code

Using this data model, MongoDB can provide high read and write capability in production environment, and the throughput is greatly enhanced compared with mysql and other SQL databases.

(2) Easy to expand, automatic failover. Scalability refers to the ability to shard data sets so that the storage burden of data is spread across multiple servers. Automatic failover is the concept of replica set. MongoDB can detect whether the primary node is alive or not, and automatically promote the secondary node to the primary node in case of inactivation to achieve failover.

(3) Because the data model is object-oriented, it can represent rich and hierarchical data structures. For example, the blog system can “comment” directly into the document of “article”, instead of creating three tables like MyQSL to describe such relationships.

3. Main features

(1) Document data type SQL type database is normalized, through the primary key or foreign key constraints to ensure the integrity and uniqueness of data, so SQL type database is often used for high data integrity system. In this respect, MongoDB is inferior to SQL type database, and MongoDB has no fixed Schema. Because MongoDB has fewer such constraints, data storage data structure can be more flexible and storage speed is faster.

(2) Instant query ability MongoDB retains the ability of instant query of relational database and index (bottom layer is based on B tree). This takes advantage of relational databases, which NoSQL Redis does not have.

(3) Replication capability MongoDB itself provides a copy set to distribute data on multiple machines for redundancy. The purpose is to provide automatic failover and extended read capability.

(4) Speed and durability

The MongoDB driver implements a write semantics fire and forget, that is, when writing through the driver call, the result of success can be returned immediately (even if there is an error), so that the writing speed is faster, of course, there will be some insecure, completely dependent on the network.

MongoDB provides the concept of Journaling log. In fact, like mysql’s bin-log, when it needs to be inserted, records will be written into the log first, and then the actual data operation will be completed, so that data will not be wrong in case of power failure and sudden interruption of the process. Fixes can be made by reading Journaling logs through the repair feature.

(5) Data expansion

MongoDB uses sharding technology to expand data. MongoDB can automatically sharding and transfer data blocks in sharding, so that data stored in each server is the same size.

4.C/S service model

MongoDB core server is mainly started by mongod program, and there is no need to configure the memory used by MongoDB during startup, because its design philosophy is that memory management is best left to the operating system, the lack of memory configuration is the highlight of MongoDB design, in addition, Sharding is also available through the Mongos routing server.

The main client of MongoDB is the INTERACTIVE JS shell started by Mongo. Using JS shell can directly communicate with MongoDB using JS, and query MongoDB data using JS syntax just like using SQL statements to query mysql data. In addition, a variety of language driver packages are provided to facilitate the access of various languages.

5. Complete command line tools

Mongodump and MongoRestore, standard tools for backing up and restoring databases. Output THE BSON format and migrate the database.

Mongoexport and Mongoimport, used to import and export JSON, CSV and TSV data, useful when data needs to support multiple formats. Mongoimport can also be used with initial imports of large data sets, but before importing it, note that mongoDB often requires some tweaking of the data model to make the most of it.

Mongosniff, a network sniffing tool used to observe operations sent to the database. Basically, you convert BSON over the network into shell statements that people can read easily.

Therefore, it can be concluded that MongoDB combines the best features of key-value storage and relational database. Because of its simplicity, the data is extremely fast, and it is relatively easy to scale a database that also provides a complex query mechanism. MongoDB needs to run on a 64-bit server and is best deployed separately because it is a database, so it also needs to be hot standby and cold standby.

Go to MongoDB shell

Since this article is not an API manual, the use of shell here is also a basic introduction to what functions can be used and what statements, mainly to show the convenience of using MongoDB shell, if you need to know the specific MongoDB shell syntax can consult the official documentation.

1. Switch databases

use dba
Copy the code

Creating a database is not required; databases and collections are created only when the document is inserted for the first time, consistent with the dynamic handling of data. Simplifies and speeds up the development process, and facilitates dynamic namespace allocation. If you are concerned about databases or collections being created accidentally, you can turn on strict mode.

2. Insert syntax

db.users.insert({username:"smith"})
db.users.save({username:"smith"})
Copy the code

Difference: Insert () prompts an error if there is a primary key in the new data, while save() changes the original content to the new content. Such as:

Insert ({_id: 1, “name” : “n2”}); save({_id: 1, “name” : “n1”}); insert({_id: 1, “name” : “n2”}); 1, “name” : “n2”}) will change n1 to n2.

Similarity: If there is no primary key in the new data, a record is added.

Insert ({” name “:” n2 “}) insert({” name “:” n2 “}) insert({” name “:” n2 “}) “N2”}) adds a line of data.

3. Look up grammar

db.users.find()
db.users.count()
Copy the code

4. Update your grammar

Db.users. update({username:" Smith "},{$set:{country:"Canada"}}) // Change the country of the Smith user to CanadaCopy the code
Db.users. update({username:" Smith "},{$unset:{country:1}}Copy the code
Db.users. update({username:" Jones "},{$set:{favorites:{movies:["casablance","rocky"]}}}) //Copy the code
Db.users. update({"favorites. Movies ":"casablance"},{$addToSet:{favorites. Movies :"the Maltese "}},false,true) //Copy the code

5. Delete syntax

Db.foo.remove () // Delete all data db.foo.remove({favorties.cities:"cheyene"}) // Delete db.drop() according to conditions // Delete the entire setCopy the code

6. Index related syntax

Db.numbers. EnsureIndex ({num:1}) // Create an ascending index db.numbersCopy the code

7. Basic management syntax

Db.numbers. Stats (); // Display database status db,shutdownServer(); // display database status db, numbers. // Stop the database db.help() // get the database operation command db.foo.help() // get the table operation command TAB // Can automatically complete the command for usCopy the code

The above commands are just a simple example. Suppose you had not learned any database syntax before and started learning SQL query syntax and MongoDB query syntax, which one would you find easier? If you use the Java driver to operate MongoDB, you will find that any query is just like the query provided by Hibernate. As long as you build a query condition object, it can be easily queried (examples will be given below). Therefore, it is no problem to start with MongoDB JS shell, and just because of such simple and perfect query mechanism, I deeply fall in love with MongoDB.

Use Java driver

Using the Java driver to link to MongoDB is a very simple matter, simple reference, simple to add, delete, change and check. After using the Java driver, I found that spring’s package for MongoDB is not as good as the official one provided by itself. Here is a simple demonstration of how to use it.

1. Import jar packages using Maven

< the dependency > < groupId > org. MongodbgroupId > < artifactId > mongo - driver - syncartifactId > < version > 3.8.0 - beta3version > dependency>Copy the code

2. Create an access client

MongoClient client = MongoClients. Create (" mongo: / / 10.201.76.94:27017 ");Copy the code

3. Obtain the number of collections

public long count() {
        MongoClient client = this.getClient();
        MongoCollection collections= client.getDatabase("mongodb_db_name").getCollection("mongodb_collection_name");
        return collections.count();
    }
Copy the code

4. Query the collection

public List find(Document params,Bson sort,int skip,int limit) { MongoClient client = this.getClient(); MongoCollection collections= client.getDatabase("mongodb_db_name").getCollection("mongodb_collection_name"); List list = new ArrayList(Integer.valueOf(config.getPro("sync_limit"))); collections.find(params).sort(sort).skip(skip).limit(limit).forEach(new Block() { @Override public void apply(Document document) { list.add(document); }}); return list; }Copy the code

Simple links and simple MongoDB operations are only examples here, so you can see how easy it is to operate. The driver communicates with MongoDB based on TCP socket. If there are many query results that cannot be put into the first server, a getMore instruction will be sent to the server to obtain the next batch of query results.

When data is inserted into the server, it does not wait for a response from the server. The driver assumes that the write is successful and actually uses the client to generate the object ID. However, this behavior can be enabled through configuration and security mode, which verifies the error of server insertion.

Schema design principles

1. Pay attention to MongoDB features

Understand the basic data units of MongoDB. In a relational database there are tables of data with columns and rows. The basic unit of MongoDB data is BSON document, and there are keys pointing to variable types of values in key values. MongoDB has instant query, but does not support join operations. Simple key value storage can only obtain values according to a single key, and does not support transactions, but supports a variety of atomic update operations.

2. Pay attention to the read and write features of the system

Such as what the read/write ratio is, what queries are required, how the data is updated, whether there are concurrency issues, whether the degree of data structure is required to be high or low. The requirements of the system determine whether mysql or MongoDB.

3. Pay attention to the design pattern of MongoDB Schema

Inline vs. reference: Use inline documents when child objects always appear in the context of parent objects; Otherwise, the child objects are stored in a separate collection.

One-to-many relationship: Add id to the “many” set to point to the dependent ID.

Many-to-many: Using an array of objects to point to another object in one of these correspondences.

Tree: With a path, each node in the tree contains a path field, which specifically stores the ID of each node’s ancestor.

Dynamic attributes: You can add indexes to different dynamic attributes. If you want to circle the attributes in a range, you can use key-value to add indexes to the unified key.

About transactions: If transaction support is required, you have to choose another database or provide compensatory transactions to solve the problem of transactions.

There are some principles to keep in mind when designing a schema, such as:

  • You cannot create useless indexes

  • You cannot store different types in the same field

  • You can’t put multiple entities in a collection and you can’t create large, deeply nested documents

  • You can’t create too many collections. Collections, indexes, and database namespaces are all limited

  • Cannot create a collection that cannot be sharded

4. Pay attention to the details of MongoDB

(1) Focus on the concept of database

A database is a logical and physical grouping of collections. MongoDB does not provide the syntax for creating a database, and only when a collection is inserted does the database start to be created. After the database is created, a set of data files are allocated on disk. All collections, indexes, and other metadata of the database are stored in these files.

db.stats()
Copy the code

(2) Focus on the concept of set

A collection is a container of documents that are structurally or conceptually similar. The name of the collection can contain numbers, letters, or. Symbol, but must begin with a letter or number, completely.

Limit collection names to 128 characters. In fact, the. Symbol is useful in collections, providing some sort of virtual namespace, as an organizational principle that is treated equally with other collections. Available in collections.

System.namespaces // Query all namespace definitions in the current database. System.indexes // Store all index definitions in the current databaseCopy the code

(3) Focus on documentation

The second is key values. In MongoDB, all strings are of UTF-8 type. The number types include double, int, and long. Date types are all UTC, so the time in MongoDB is 8 hours slower than Beijing time. The size of the entire document is limited to 16MB to prevent the creation of ugly data types, and small documents can improve performance. The ideal number of batch inserts is 10 to 200, and the size cannot exceed 16MB.

Index and query optimization

1. Rule of thumb for indexing

(1) Index can significantly reduce the amount of work required to obtain documents. Specific comparison can be made through the.explain() method

(2) During query parsing, MongoDB selects an index for query through the optimal plan. If there is no most suitable index, it will first use different indexes for query and finally select an optimal index for query

(3) If there is a compound index a-B, then the index only on A is redundant

(4) The order of the keys in a composite index is important

2. Index type

(1) single-key index (2) composite index (3) unique index (4) Sparse index such as index field may appear null value, or a large number of documents do not contain the indexed key.

3. Index construction

If the data set is large, building the index will take a long time and affect the performance of the program

Db.currentop () // Check the build time of the indexCopy the code

When mongoRestore is used, the index is rebuilt. This parameter is available when a large-scale deletion has been performed

db.values.reIndex() 
Copy the code

Compress and rebuild the index.

4. Identify slow queries

(1) View slow query logs

Grep -e '([0-9])+ms' mongod. Log db.setprofillingLevel (2) Log db.setProfillingLevel(1) // Only slow (100ms) operations are loggedCopy the code

(2) Analysis of slow query

Db.values.find ({}).sort({close:-1}).limit(1).explain() scanOrder indicates that no index is usedCopy the code

Cursor BasicCursor is used when there is no index, and BtreeCursor is used when the index is used

N indicates the result set to be returned

Nscanned indicates the number of documents that need to be scanned. IndexBounds indicates index boundaries

Note that newer versions of MongoDB require arguments for the Explain method, otherwise just plain information is displayed.

MongoDB replica set

This section also briefly presents the ease of building MongoDB replica sets, the strength of replica sets and the ease of monitoring

1. Why use replica sets

Provides primary/secondary replication, hot backup, and failover capabilities

2. Construction method

rs.initiate()
rs.add("localhost:40001")
rs.add("localhost:40002",{arbiterOnly:true})
Copy the code

3. Monitor

db.isMasrter()
rs.status()
Copy the code

4. How replica sets work

In fact, MongoDB’s operation on replica sets is similar to mysql’s master-slave operation. Let’s take a look at mysql’s master-slave data flow process

Primary binlog -> secondary relay. Log -> secondary bin.log -> secondary database MongoDB mainly relies on oplog log files

Primary oplog -> Secondary Oplog write operations are recorded and added to the oplog of the primary node. At the same time, all slave nodes copy Oplog. First, check the timestamp of the last entry in your Oplog. Second, query all entries in oplog that are larger than this timestamp. Finally, add those items to your Oplog and apply them to your library. Slave nodes use long polling to immediately apply new entries to the master node oplog.

The slave node stops replication when the following occurs

  • If the slave node cannot find the point it is synchronizing in the master node’s Oplog, the replication is permanently stopped

  • Once a slave node fails to find a point in the master node’s Oplog that it has synchronized, it is no longer guaranteed a perfect copy of the slave node

The local database holds all replica set element data and oplog logs

  • Replset.minvalid contains initial synchronization information for the specified replica set member

  • System.replset is saved in the replica set configuration document

  • System. Indexes Indicates the standard index description container

  • Me Slaves is mainly used to write concerns

You can use the following command to view the replication status

db.oplog.rs.findOne()
Copy the code
  • Ts saves the BSON timestamp of this entry

  • T is a description from the epoch

  • I is the counter

  • Op indicates the opcode

  • Ns indicates the relevant namespace

5. Heartbeat detection

Each replica set member pings all other members once every second, and you can see the last heartbeat detection timestamp and health status of the node through rs.status().

6. Failover

There is no need to describe this point too much, but there is a special scenario where if both the slave node and the quorum node are killed and only the master node is left, he will demote himself to slave node.

7. Commit and rollback

If the data from the master node has not been written to the slave, the data is not committed. When the master node becomes the slave node, a rollback will be triggered. The data that has not been written to the slave node will be deleted and the rollback can be restored using the BSON file in the ROLLBACK subdirectory.

8. Drive and copy

(1) Use single node link

You can only link to the master node. If you link to a slave node, write operations will be denied, but if you do not use safe mode, mongo’s fire and Forget feature will eat exceptions that deny write operations.

(2) Use copy set link

Failover can be performed automatically based on writes, but failure can still occur when a replica set is elected for a new election. If you do not use safe mode, you can still fail to write, but the reality succeeds.

(3) Write concerns

Write attention can be used to keep track of whether data has been written to MongoDB’s libraries, but using write attention can cost performance and require a trade-off between speed and persistence.

Seven, shard

Sharding is a concept of database sharding, here is also a simple summary of why to use sharding and the principle of sharding, operation.

1. Why sharding

When the data volume is too large, indexes and working data sets take up more and more memory, so you need to solve this problem with sharding load

2. How sharding works

(1) Sharding component

Sharding: Each shard is a set of replicas

Mongos router: A router that directs read and write requests to the appropriate shard

Config: Persist the metadata of the sharded cluster, including: global cluster configuration; Each database, collection, and scope-specific data location; A change record that holds historical information about the migration of data between shards. The configuration servers do not exist in the form of replica sets. Mongos submits information to the configuration server in two stages to ensure consistency between the configuration servers.

(2) The core operation of sharding

Sharding a collection: Sharding is divided according to a range of attributes, and MongoDB uses so-called sharding keys to position each document within those ranges

Block: is a continuous range of sharding keys in a sharding, which can be understood as several blocks constitute sharding, and the sharding constitutes all the data of MongoDB

(3) Split and migration

Block splitting: There is only one block at initialization and block splitting is triggered when the maximum block size is 64MB or 100000 documents. Split the original scope in half, so you have two blocks, each with the same number of documents.

Migration: When the size of the data in shard is different, the migration action will occur. For example, if the data in shard A is large, some blocks in shard A will be moved to shard B. A sharding cluster is managed by a software process called an equalizer. The task is to ensure that data is evenly distributed among the sharding. When the difference between the sharding with the most blocks and the sharding with the least is greater than 8, the equalizer initiates an equalization process.

3. Sharding

Start two replica sets, three configuration servers, and one Mongos process

Configuration fragmentation

Sh.help () // View the shard help sh.addShard() // Add the shard DB,getSiblingDB("config").shards. Find () // View the shard list sh.status() // View the shard details Enable shard db.getsiblingDB ("config"). Databases,find() // View the database list Sh.shardcollection ("cloud-docs. Spreadsheets ",{username:1,_id:1}) Sh.getsiiblingdb ("config").collections.findone () // Check the collection list db.chunks.count() // Check the number of blocks db.chunks.findone () / / db piece of information. The changelog. Count (} what: "split" |) / / check the block segmentation log db. The changelog. Find ({what: "moveChunk.com MIT}"). The count () / / view the log record migrationCopy the code

4. Query and index of fragments

(1) Fragment query type

Targeted query: Queries contain sharding keys

Global query or scattered/clustered query: Queries do not contain sharding keys

Query process: The query is routed to the specified shard through the shard key. Once the shard reaches a certain shard, the shard decides which index to use to execute the query

(2) Index

Each shard maintains its own index. When an index is declared on a shard collection, each shard builds a separate index for its portion of the collection. The shard collection on each shard should have the same index.

Shard collections only allow unique indexes on the _ID field and shard key, but not elsewhere, because this requires communication between shards, which is complicated to implement.

When a shard is created, an index is created based on the shard key.

5. Select a sharding key

(1) Shard keys are unmodifiable and the selection of shard keys is very important (2) inefficient shard keys

Poor distribution: Using BSON object ids causes all newly inserted documents to fall into a small contiguous range that cannot be dispersed

Lack of locality: Ascending shard keys have a clear direction, while completely random shard keys have no direction at all. The former cannot be inserted separately, while the latter is inserted separately, such as using MD5 as a sharding key

(3) Ideal sharding key

Distribute the inserted data evenly across the shards

Ensure that CRUD operations can take advantage of locality with sufficient granularity for block splitting

Sharding keys that meet these requirements typically consist of two fields, the first coarse-grained and the second fine-grained

6. Sharding in production

(1) Deployment topology

  • Duplicate Mongod: A separate deployment server is required

  • Configuring the server: Configuring the server does not require its own machine

Divide by data center

(2) Minimum requirements

  • Each member of a replica set, whether a full replica set node or a quorum node, needs to be placed on a different machine and each replica set member used for replication needs to have its own machine

  • The replica set quorum node is lightweight and can be shared with other processes on the same machine

  • The configuration server can also choose to share a machine with other processes

(3) Configuration precautions

To estimate the cluster size, you can use the following command to fragment an existing collection

Sh. SplitAt (" cloud - docs. Spreadsheets, "{" username" : "Chen", "_id" : ObjectId (" ")}) / / manual split block MoveChunk ("cloud-docs. Spreadsheets ",{username:" Chen "},"shardB") // manually move a block to shardB The runCommand ({removeshard: "/ arete shard - 1:30100, arete: 30101"}) / / delete the shard db.runCommand({moveprimary:"test",to:"shard-0-test-rs"}); // Move the main shardCopy the code

(4) Backup fragment cluster

To back up fragments, stop the equalizer

db.settings.update({_id:"ba; ancer"},{$set:{stopped:true},true}); Sh. SetBalancerState (false); Db.locks. Find ({_id:"balancer"}); sh.isBalancerRunning(); // Check the status of the equalizer. Any value greater than 0 indicates that the equalizer is still in progressCopy the code

Deployment and management

Deployment of 1.

(1) Deployment architecture

Using a 64-bit or 32-bit machine limits mongodb’s memory to a maximum of 1.5GB

(2) the CPU

CPU bottleneck occurs only when indexes and working sets can be placed into memory. CPU is used to retrieve data in mongodb. If CPU usage is saturated, you can query logs slowly to check whether it is caused by query problems. If the problem can be solved by adding indexes, mongodb uses CPU to write data. However, mongodb only uses one core at a time to write data. If frequent writes occur, sharding can be used to solve this problem

(3) Memory

Large memory is mongodb’s guarantee, and if the working set size exceeds memory, performance degrades because it increases the loading of data into memory

(4) Hard disk

By default, mongodb forcibly synchronizes data with disks every 60 seconds, which is called background refresh and generates I/O operations. Mongodb loads the data on the disk to memory upon restart. High-speed disks reduce synchronization time

(5) File system

Use ext4 and XFS file systems

The last access time is disabled

vim /etc/fstab
Copy the code

(6) File descriptor

The Linux default file descriptor is 1024, which requires a large increase

(7) Clock

The NTP server is used between mongodb nodes

2. The security

(1) Bind IP addresses

To start, run the – -bind_ip command

(2) Authentication

To start, run the -auth command

Db.adduser ("","",true) // Creates a user, with the last parameter specifying whether it is read-onlyCopy the code

(3) Duplicate set identity authentication

To use keyFile, note that the permission of the keyFile file must be 600; otherwise, it will fail to start

3. Import and export data

mongoimport
mongoexport
Copy the code

4. Configure the server

(1) Topology structure

The replica set requires at least two nodes, where the quorum node does not need to have its own server

(2) Journaling logs

The Journaling logs will be written first, and the data will not be written directly to the hard disk, but to the memory. However, Journaling logs consume memory, so they can be closed on the primary library and started on the secondary library. A solid state drive can be used separately for Journaling logs

At insert time, drivers can ensure Journaling is inserted before feedback, but this can be very performance critical.

5. Log

Logpath specifies the log storage address - VVVVV (the more v, the more detailed output) db.runCommand({logrotare:1}) Enables rolling logsCopy the code

6. Database monitoring commands

(1) serverStatus

  • GlobalLock represents the total time the server spends writing locks

  • Mem shows how to use memory

  • Bits indicates the bit length of the machine

  • Resident Indicates the occupied physical memory

  • Virtual indicates the virtual memory used

(2) the top

(3) the db. CurrentOp ()

7.mongostat

Dynamically display mongodb active data

8. Web console

Occupy the port 1000 up from the current mongodb listening port

9. Backup and restoration

Mongodump (1)

Export the database contents into BSON files, which MongoRestore can read and restore

(2) mongorestore

Restore the exported BSON file to the database

(3) Back up original data files

Db. RunCommand ({fsync:1,lock:true})

Db. $cmd.sys.unlock. FindOne () requests the unlocking operation, but the database is not unlocked immediately. Db.currentop () is used for authentication.

10. Compaction and repair

(1) Repair

RepairDatabase. RunCommand ({repairDatabase:1}) repaira single database repair involves reading and rewriting all data files and rebuilding indexes based on the Jourling file

(2) press

Db.spreadsheets.reindex () // reIndex db.runcommand ({compact:”spreadsheets”}) rewrites the data file and rebuilds the entire index of the collection. If you need to stop or run on a secondary library, add the force parameter to ensure that a write lock is added if you need to run on a primary library.

11. Performance tuning

(1) Monitor disk status

Iostat (2) checks indexes and queries for improved performance

In general, scan as few documents as possible.

Ensure that there are no redundant indexes. Redundant indexes occupy disk space, consume more memory, and require more work for each write

(3) Add memory

Db.stats () // Check whether the size of the database is dataSize or indexSize. If the sum of the two values is greater than memory, performance will be affected.

StorageSize Data is more than twice the size of the dataSize data. If the storageSize data is more than twice the size of the dataSize data, disk fragmentation affects the performance and requires compression.