Optimization practice of ten-million-level MongoDB data index

Xiao Li is the person in charge of the back-end of this company. Suddenly one afternoon, he received a large number of feedback from the customer service that users could not use our APP, and many operations and loads were timed out on the network.

Xiao li immediately, after receipt of the information screening question reason, but a moment more, positioning to the database in the slow query to overload the server load state, the CPU is high, so why this would happen, xiao li scared right now, after the query information, began to slow the direction to explore, sure enough, due to the business data is growing rapidly, The corresponding data table does not have the corresponding query index data. At this moment, Xiao Li smiled and added index fields to the relevant data table of the database with full confidence. However, the situation did not improve, and the online service still did not recover. As a result of experience, we had to take a degraded solution (closing the relevant query business of this table) to temporarily restore the online service.

However, the matter is not over, the problem has not been fundamentally solved, the company and himself are still very concerned about the solution of this problem, when having dinner in the evening, Xiao Li suddenly remembered that he knew a big man in the industry (Lao Bai). The problem with old white said again, old white and before long, soon professional told the small white what problems, how to solve this problem, the right to add index must first learn to do a query analysis, and then understand the ESR best practice rules (do) below, xiao li did not feel lost, because my own inadequacy But because of their own shortcomings are full of intellectual curiosity.

What are some good postures for database indexing?

MongoDB index type

A single bond index

db.user.createIndex({createdAt: 1}) 
Copy the code

CreatedAt creates a single-field index that can quickly retrieve various query requests for createdAt fields. {createdAt: 1} is a common ascending index, and {createdAt: -1} can also be used to descending an index. The ascending and descending effects are the same for single-field indexes.

Composite index

db.user.createIndex({age: 1, createdAt: 1}) 
Copy the code

You can create a joint index for multiple fields, sorting by the first field, sorting documents with the same field by the second field, and so on, so the application of sorting and indexing is also very important when making queries.

Db.user. createIndex({age: 1, createdAt: {age: 1, createdAt: Db.user. createIndex({age: 1}, createdAt: {age: 1}, createdAt: {age: 1}, createdAt: Db.user.createindex ({age: 1, createdAt: 1}) {age: 1, createdAt: 1}

Multi-value index

If the index field is an array, the created index is called a multi-key index. A multi-key index creates an index for each element of the array

// User's social login information, schema = {... SnsPlatforms :[{platform:String, openId:String, openId:String,}]} Db.user.createindex ({snsffices.openid :1})Copy the code

TTL index

Can be for a time field, specify an expiration date for display of the document (for just over a period of effective data storage, document to specify a time will be deleted, so you can finish automatically deleted data) the delete operation is safe, the data will choose in the application of slack period, so won’t delete a large number of documents cause high IO serious influence on the performance data.

Part of the index

This feature is supported only in version 3.2. Indexes are created for qualified data documents to save index storage space and write costs

Db.user.createindex ({sns.q.openID :1}) /** * add index to openId ({sns.q.openID :1}); */ db.user.createIndex({sns.q.openID :1},{partialFilterExpression:{$exists:1}})Copy the code

Sparse index

A sparse index contains only document entries with index fields, even if the index field contains null values. The index skips all documents that lack an index field.

db.user.createIndex({sns.qq.openId:1} ,{sparse:true})

Note: since version 3.2, partial indexes are provided, which can be used as a superset of sparse indexes. It is officially recommended to use partial indexes in preference to sparse indexes.

ESR index rules

Order of index fields: equal > sort > range

Equal matches fields first, Sort conditions in the middle, Range matches fields last, also for ES,ER.

Practical examples: {” class “:1,” subject “:1,” score “:1} {” class “:1,” subject “:1,” score “:1}

How do we analyze the hits and validity of this index?

The db.collection.explain() function can output a document to find the execution plan, which can help us make a better choice. Parsing functions return a lot of data, but we can mainly focus on this field

ExecutionStats performs statistics

{
    "queryPlanner": {
        "plannerVersion": 1,
        "namespace": "test.user",
        "indexFilterSet": false,
        "parsedQuery": {
            "age": {
                "$eq": 13
            }
        },
        "winningPlan": { ... },
        "rejectedPlans": []
    },
    "executionStats": {
        "executionSuccess": true,
        "nReturned": 100,
        "executionTimeMillis": 137,
        "totalKeysExamined": 48918,
        "totalDocsExamined": 48918,
        "allPlansExecution": []
    },
    "ok": 1,
}
Copy the code

NReturned Indicates the number of returned data rows

ExecutionTimeMillis Total command execution time, in milliseconds

TotalKeysExamined: indicates that MongoDB scans N index data. The number of keys checked matches the number of documents returned, which means mongod only needs to check the index key to return the result. Mongod doesn’t have to scan all the documents; only N matching documents are pulled into memory. This query result is very efficient.

TotalDocsExamined Number of document scans

The smaller the value of these fields is, the better the efficiency is. The best state is nReturned = totalKeysExamined = totalDocsExamined. If the difference is large, it indicates that there is much room for optimization. Details of the optimal execution plan returned by the query optimizer for this query (Queryplanne.winningPlan)

stage

IXSCAN: index scan: FETCH: document SHARD_MERGE: merge the data returned by each shard into a SORT LIMIT: SKIP: SKIP is used to SKIP. IDHACK: query the _id. SHARDING_FILTER: query the fragmented data using mongos. Count with db.col.explain ().count() COUNTSCAN: count Stage returns COUNT_SCAN without Index: SUBPLA: stage when an Index is used; SUBPLA: stage when an Index is not used; TEXT: stage when an Index is used; stage when a full-text Index is usedCopy the code

What we don’t want to see (pay attention to the following, problems may occur)

The oversized SKIP SUBPLA did not hit index when using $or COUNTSCAN did not hit index when executing countCopy the code

Next, let’s look at the actual execution order of a normal query

db.user.find({age:13}).skip(100).limit(100).sort({createdAt:-1})
Copy the code

As can be seen from the figure, IXSCAN index scanning is the first step, and SKIP data is the last step for filtering.

In executionStats every item has nReturned and executionTimeMillisEstimate, so that we can see from inside to outside the query execution, which step execute slowest problems.

Document design patterns for column transitions

First of all, database index is not the more the better, in MongoDB single document index upper limit, index set can not exceed 64, some well-known factories recommended no more than 10.

In a main table, because of the redundant document design, there is a lot of information that needs to be indexed, again using the social logins as an example

The conventional design

Schema = {... Qq :{openId:String}, wxApp :{openId:String}, Weibo :{openId:String}... } // Each time you add a new login type, Db.user.createindex ({q.openID :1}) db.user.createIndex({wxapp.openID :1}) db.user.createIndex({weibo.openId:1})Copy the code

Column career change design

Schema = {... SnsPlatforms :[{platform:String, openId:String,}]} snsPlatforms:[{platform:String, openId:String,}]} Don't need change index design, an index to solve all the problem of the same type db. The user. CreateIndex ({snsPlatforms. OpenId: 1, snsPlatforms. Platform: 1})Copy the code

Question: Why does openId come before PLAFORM?

This story tells how Xiao Li dealt with the problem that his knowledge could not solve. Everyone is out of their depth, so in this case, it’s a priority to solve the problem or reduce the impact of the accident.