preface

Like MySQL, MongoDB also has different types of storage engines, which solve problems in different scenarios. You can specify a storage engine when creating a database or dataset, as follows:

// Create the specified storage engine for the dataset
db.createCollection(
   'users',
   { storageEngine: { wiredTiger: { configString: "access_pattern_hint=random"}}});// The database directory specifies the storage engineMongod --storageEngine wireTiger --dbpath < database path >Copy the code

Since MongoDB 3.2, the default storage engine is WiredTiger. This section describes the features of WiredTiger.

Document level concurrency

The WiredTiger storage engine uses document-level concurrency control for write operations, so that multiple clients can simultaneously modify documents on the same dataset (a bit like row locking).

WiredTiger uses optimized concurrency control for most read and write operations. WiredTiger uses Intent locks for the global, database, and dataset. When the storage engine detects a conflict between two operations, the operation that causes a write conflict will cause MongoDB to retry the operation.

Update mechanism

In the WiredTiger engine, there are no updates based on the original document. If you update an element of a document, you are actually inserting a whole new document, and the document is deleted.

Snapshots and detection points

WiredTiger uses multi-version concurrency control (MVCC). At the beginning of an operation, WiredTiger provides a point-in-time snapshot of the data to be manipulated. A snapshot represents a contiguous segment of data in memory. When data is written to the disk, WIredTiger continuously writes snapshot data to the disk. The data currently available acts as a checkpoint in the data file. The detection point ensures that the data file is continuous and contains the previous detection point, so the detection point can be considered a recovery point. During the process of writing a new checkpoint, the previous checkpoint remains valid. Therefore, even if MongoDB stops writing when an error occurs while writing to a new detection point, MongoDB can recover data from the previous valid detection point after restart. The new detection point is available when WiredTiger’s element data table automatically updates to point to the new detection point. Once the new checkpoint is available, WiredTIger can free up the storage space of the old checkpoint.

Data compression

Based on WiredTiger, MongoDB supports compression of all data sets and indexes. Compression can use a little CPU resources to reduce the storage space. By default, WiredTiger uses Snappy compression library (compression rate such as Zlib, but less CPU resources) for block compression of data sets, and uses prefix compression for indexes. For data sets, you can also use zlib or ZSTD (after version 4.2) compression libraries. The compression algorithm for the specified dataset can be specified when the dataset is created, as shown below:

// Specify zlib as the compressed library when creating the dataset
db.createCollection('users', { 
  storageEngine: {
    wiredTiger: {
      configString: 'block_compressor=zlib'}}});// Enable index compression
db.employees.createIndex({age: 1}, {
    storageEngine: {
      wiredTiger: {
         configString: 'prefix_compression=true'}}});Copy the code

conclusion

This article introduces the features of MongoDB’s default storage engine WiredTiger. By configuring some parameters of the storage engine, you can tune it to achieve a balance between storage and performance. Next we’ll look at some of the parameter Settings for the MongoDB configuration file.