Mongodb storage features and internal principles

preface

This paper focuses on mongodb storage features and internal principles.

In the next article, let's build the Replica Sets+Sharded Cluster together

Copy the code

The storage engine

WiredTiger engine

3 New engine recommended 2. Supports higher read/write load and concurrency

All write requests are based on "document-level" locks,

So multiple clients can update different documents in a colleciton at the same time,

This fine-grained lock can support higher read/write loads and concurrency.

Because more cpus can effectively improve wireTiger performance for production environments,

Because it's IO is multithreaded

Copy the code

3. Configure cache

You can set the amount of memory the engine uses by specifying the "cacheSizeGB" parameter in the configuration file,

This memory is used to cache working set data (indexes, namespace, uncommitted write, Query buffers, and so on)

Copy the code

4. Journal is a pre-written transaction log

A. Journal is a pre-written transaction log to ensure data persistence

B, wiredTiger every 60 seconds (default) or when the data to be written reaches 2G, mongodb will flush the journal file to the disk and make a checkpoint. Indicates that the previous data representation has been persisted in the data file, and subsequent data changes are stored in memory and journal.

C. For write operation, the data is first written to journal persistently, and then the change data is saved in memory. After the conditions are met, a new check point is submitted, that is, the data before the check point is only stored in journal persistently, but not persisted in mongodb data files. If mongodb exits unexpectedly before checkpoint submission and is restarted later, data can be recovered based on journal logs

D. Journal logs are synchronized to disks every 100 ms by default, and a new Journal file is generated every 100 m of data. Journal uses SNappy compression by default.

E. Mongod can disable journal, which can reduce its cost to some extent; For single-point Mongod, closing journal may lose data between checkpoint points (those that have not been committed to disk data files) in the event of an abnormal shutdown; For the Replica set architecture, durability is a little higher, but absolute security is not guaranteed (e.g., when all the nodes in the replica set exit almost simultaneously).

MMAPv1 engine

Native storage engines use system level memory mapped Files directly.

2. High performance for INSERT, read, and in-place updates (updates do not cause document size to grow)

3. However, MMAPV1 supports the concurrency level of lock to the collection level, so only one write operation can be performed for the same collection at the same time. Compared with wiredTiger, the concurrency of write operation is slightly weaker

4. For production environments, larger memory makes the engine more efficient and reduces the frequency of page faults

5. However, due to its concurrency level limitation, multi-core cpus do not benefit from it

6. This engine will not use swap space, but wiredTiger requires some swap space

7. For large file MAP operations, it is taboo to modify data in the middle of the file, which will lead to file length increase, which will involve large scale adjustment of index references

8. All records are stored continuously on disk. When a document size becomes too large, mongodb needs to re-allocate a new record (the old record mark is deleted, and the new record is reallocated at the end of the file).

9. This means that mongodb also needs to update the document’s index (offset to the new record), which takes more time and storage overhead than in-place updates.

10. Therefore, if your mongodb usage scenario has a lot of these updates, MMAPv1 may not be the right engine for you

11. The order of document in read cannot be guaranteed without an index.

12. After 3.0, mongodb defaults to “Power of 2 Sized Allocations,” so the record for each Document will consist of real data and some padding, This padding allows the document size to grow moderately during update to minimize the possibility of reallocating the Record. In addition, reallocation of space can also cause disk fragmentation (old record space)

Power of 2 Sized Allocations

1. By default, MMAPv1 uses this strategy for space allocation. The size of each document is a power of 2, such as 32, 64, 128, 256… 2MB, or multiples of 2MB (2M,4M,6M, etc.) if the document size is greater than 2MB

2. Two advantages

The disk fragmentation space created by larger deletes or updates (the larger size means that new space is created to store the document and the older space is marked as deleted) can be reused by other INSERTS
The padding allows for a limited increase in document size without the need to reallocate space with each update.

3. Mongodb also provides an optional “No padding Allocation” if you are sure that the data will be mostly insert, in-place update, and rarely DELETE. This policy can effectively save disk space, making data appear more compact and disk utilization higher

Note: After mongodb 3.2+, the default storage engine is wiredTiger, which greatly improves storage performance. You are advised to upgrade the engine to 3.2+

Capped Collections

1. The size is a fixed value similar to a reusable buffer

If the space is filled up, new inserts will cover the oldest document, and there is usually no deletion or update for Capped, so this type of collection can support high write and read rates

2. There is no need to index this collection because INSERT is an append and read is an iterator with almost no random reads

3. In the Replica set mode, oplog is realized by using this colleciton

4. Capped Collection is designed to hold the “most recent” document of a certain size

db.createCollection("capped_collections".

new CreateCollectionOptions()  

.capped(true)  

.maxDocuments(6552350)  

.usePowerOf2Sizes(false).autoIndex(true)); // There is no update involved, so you can skip power of 2

Copy the code

5, similar to the “FIFO” queue, and is a bounded queue for data cache, message type storage

6, Capped supports Update, but we generally don’t recommend that if the update causes the document to grow in size, the operation will fail and in-place update will only be used, and appropriate indexes will need to be established

7. Use remove in capped is allowed

8. AutoIndex indicates that the _id field is indexed by default

Data Model

1. Mongodb supports embedded Document, that is, the value of a field in document is also a Document

2. If the size of the embedded document (i.e., reference document) is dynamic, for example, a user can have multiple cards, because the number of cards cannot be estimated, the size of document may increase continuously to exceed “Power of 2 Allocate”, thus triggering space reallocation. Performance overhead

3. In this case, we need to store the embedded documents separately in an additional collection as one or more documents, such as the card list in the Card Collection

4. If the size of the Reference document is small, it can be embedded; if the size is large, it is recommended to store the reference document separately. Another advantage of embedded documents is the atomicity of write

The index

1. Improve query performance. By default, unique indexes are created for the _ID field.



2. Because indexes not only take up a lot of memory, but also occupy disk, we need to create a limited number of indexes, and it is better not to create duplicate indexes;



3, each index needs 8KB of space, and update and INSERT operations will cause index adjustments.

The write performance is slightly affected. Indexes can only be enabledreadOperating income,

Therefore, applications with a high read/write ratio should consider indexing



Copy the code

Large set splitting

Like one for storagelogThe collection,

logThere are two types of "dev" and "debug", and the result is roughly as follows

{"log":"dev"."content":"..."}, {"log":"debug"."content":"..."}.

The number of documents of these two kinds of logs is relatively close.

For queries, even if givenlogField index, this index is not efficient,

Consider placing them in two collections, such as log_dev and log_DEBUG.

Copy the code

Data life cycle management

Mongodb provides an expire mechanism,

That is, you can specify the length of the document to be saved and automatically delete the document after it expires, namely, the TTL feature.

This feature will be useful in many situations,

For example, "Captchas are valid for 15 minutes," "messages are valid for 7 days," and so on.

Mongodb starts a background thread to remove expired documents



You need to create a "TTL index" for a date field,

For example, insert a document: {"check_code":"101010".$currentDate: {"created":true}}},

The default value of the created field is the system time Date. Then we create a TTL index for the created field:



collection.createIndex(new Document("created",1),new IndexOptions().expireAfter(15L,TimeUnit.MILLISECONDS)); / / for 15 minutes



When inserting documents into a collection, the created time is the current system time.

The creatd field is indexed with a "TTL" of 15 minutes.

Mongodb background thread will scan and detect the comparison between each document (Created time + 15 minutes) and the current time.

If it is found to be expired, the index entry is deleted (along with the document).



In some cases, you may need to implement "expiration at a specified time,"

You just need to adapt the above documents and indexes,

That is, created is specified as "target time" and expiredAfter is specified as 0.

Copy the code

Architectural patterns

Replica Set Indicates the Replica set

Typically three peer nodes form a "replication set" cluster,

There are various roles such as primary and secondary

Primary is responsible for the read/write request, secondary is responsible for the read request.

Secondary follows primary and applies write.

If priMAY fails, the cluster elects a new primary, which is the failover mechanism, or HA architecture.

Replication sets address single points of failure and are the smallest deployment unit for vertical mongodb scaling,

Of course, each Shard node in a Sharding cluster can also use ReplicasetImprove data availability.

Copy the code

Sharding cluster Indicates a Sharding cluster

One of the means of data level scaling;

replica setThe downside of this architecture is that "cluster data capacity" is limited by the disk size of a single node,

If the amount of data keeps increasing, it will be very difficult to expand it, so we need to adopt Sharding mode to solve this problem.

The data of the entire collection will be sharded to multiple Mongod nodes according to Sharding keys.

That is, each node holds part of the collection, and the cluster holds all of the data,

In principle, Sharding can support terabytes of data.

Copy the code

The system configuration

You are advised to deploy mongodb on Linux, select an appropriate underlying file system (ext4), and enable appropriate swap space
Whether it’s a MMAPV1 or wiredTiger engine, there’s always a direct benefit to having more memory
To improve file access efficiency, disable atime for data store files. The atime value is changed every time a file is accessed, indicating the time when the file was last accessed
Ulimit: ulimit -n 65535 ulimit: ulimit -n 65535 Ulimit: ulimit -n 65535 ulimit -n 65535

Data Files Storage (MMAPV1 engine)

Mongodb data will be stored in the underlying file system,

For example, if our dbpath is set to "/data/db" directory,

We create a database as"test", collection is "sample",

Then insert several documents into this collection. Let's look at the list of files generated under DBPath:

Copy the code



You can seetestThe database currently has six data files,

Each file consists of the name "database" plus a sequence of numbers,

The serial numbers start at 0 and increase one by one. The data files start at 16M and double in size each time (16M, 32M, 64M, 128M...). .

By default, the maximum size of a single data file is 2G,

If the smallFiles property is set (in the configuration file), the maximum is 512 MB.

Each database in mongodb supports a maximum of 16,000 data files, that is, about 32T.

If smallFiles is set, the maximum amount of data in a single database is 8TB.

If your database has a large number of data files,

You can use the directoryPerDB configuration item to place the data files for each DB in its own directory.

When data is written to the last data file,

Mongodb will immediately preallocate the next data file,

You can turn this option off with the "--nopreallocate" start command argument

Copy the code

Reference documentation

https://blog.csdn.net/quanmaoluo5461/article/details/85164588

Copy the code