Mongo index is based on b-Tree and stored in a collection of data that is easy to traverse and read. It is a structure that sorts the values of one or more columns in a database table.

The index of the database is similar to that of our book catalogue. With the index, we do not need to read the whole book, but only need to check the catalogue to know where the content we want is, and directly locate it. This way can greatly improve our search efficiency.

Poly (example

In order to give you a more intuitive understanding, I simply inserted 1 million pieces of data into it based on Mongo3.6 and analyzed the query situation through EXPLAIN.

scanv_rs:PRIMARY> db.users.count()
1000000
scanv_rs:PRIMARY> db.users.ensureIndex({"username"Scanv_rs :PRIMARY> db.users.find({scanv_rs:PRIMARY> db.users.find({"username": 'user10001'}).explain()


{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.users"."indexFilterSet" : false."parsedQuery" : {
            "username" : {
                "$eq" : "user10001"}},"winningPlan" : {
            "stage" : "FETCH".Retrieve the document by returning the index position
            "inputStage" : {
                "stage" : "IXSCAN".Create a COLLSCAN without creating an index
                "keyPattern" : {
                    "username": 1}."indexName" : "username_1".# index name
                "isMultiKey" : false.# build on array, here is true
                "multiKeyPaths" : {
                    "username": []},"isUnique" : false."isSparse" : false."isPartial" : false."indexVersion": 2."direction" : "forward"."indexBounds" : {
                    "username" : [
                        "[\"user10001\", \"user10001\"]"]}}},"rejectedPlans"Scanv_rs :PRIMARY>db.users.find({) scanv_rs:PRIMARY>db.users.find({)"username":'user10001'}).explain('executionStats')

Copy the code

I won’t show you any more here. The result of the last statement above is not shown here. The result is that the value of the “executionTimeMillis” field took 1450ms before it was created and 2ms after it was created, a hundredfold difference. TotalDocsExamined a full text scan 1000000 entries before index creation, only 1 entries after index creation.

If you’re interested, you can compare them yourself. From above we can see the power of indexes.

What kinds of indexes are there?

After briefly saying that index, we will talk about the classification of index, index is mainly divided into: unique index and sparse index.

A unique index ensures that the specified key for each document in the collection has a unique value.

For example, we want to create a username index in the collection. This way we can ensure that username has a unique username in different documents.

db.yourcollection.ensureIndex({"username":1}, {"unique": true})

E11000 dumplicate key error if you add the same username data to the set above, the system will fail.

We often encounter this error when creating an index on a collection because we already have duplicate data in the collection.

The usual way to do this is

Find duplicate data, clean it up, and rebuild it (online), for example through aggregation

Simple and crude handling with dropDups

Create a unique index using dropDups. If a duplicate value is found, the first value is retained and the other values are dropped.

db.yourcollection.ensureIndex({"username":1}, {"unique": true, "dropDups": true})

The second method is usually used in development test environments, but be aware of the online environment.

After unique indexes, let’s talk about sparse indexes.

Multiple documents that lack keys in a unique index cannot be inserted into the collection because unique indexes treat NULL as a value.

In this case, we can create a sparse index. A value can exist or not, and if it exists, it must be unique. We only need to add a spare option to create a sparse index.

For example, if we want to create an optional name, its value must be unique if the name is provided.

db.yourcollection.ensureIndex({‘username’: 1}, {‘unique’: true, ‘sparse’: true})

The above is a single key index, in fact, we have multiple keys based on composite index, full-text index, geospatial index, due to the lack of space, we will not go into this.

How to build an index?

After introducing classification, we will talk about how to create an index. Creating an index is a time-consuming and resource-consuming business. By default, index creation blocks read and write requests to the database until the index is created.

If you want to create something that can still handle read and write requests, you need to specify background when creating it.

For example, on a standalone server we can add background to True.

db.yourcollection.ensureIndex({'username': 1}, {background: true})

This approach takes a long time, but does not lock the database so that other operations can run.

We can also do this at the copy level of a small collection, creating indexes on the primary node and then synchronizing them to the backup node.

However, in the set with a large amount of data, we need to split each node to build the index, so as to avoid the normal work of all copies during the index, resulting in problems.

The steps for creating an index from a split node are as follows:

  1. Shut down A slave node and start it independently

  2. Build the index from node A here

  3. Re-add A to the copy level

  4. Repeat the three steps above

For primary nodes, we can failover to secondary nodes or create indexes directly (which has a certain impact on performance). In this way, we can greatly improve the security and stability of index building.

I once met some students who did not split the execution to create the index, resulting in several DB nodes full, can not work, we need to pay attention to, if environmental factors can not do, then we need to find DB idle time to carry out the above operation.

When to use an index?

For most scenarios, though, we need indexes to be efficient.

But sometimes we need to consider whether it is really necessary to use an index, because using an index requires two lookups, one for the index entry and one for the corresponding document based on the index pointer, whereas a full table scan requires only one lookup.

Let’s compare and contrast where indexes work and don’t work.

We know from the above figure that indexes are suitable for large collection, large document, and selective queries, but not for small collection, small document, and non-selective queries.

Some Suggestions on

Some suggestions about indexes:

  1. Learn to use Explain to analyze, compare index and non-index differences, number of items retrieved, milliseconds spent, etc

  2. Pay attention to the read/write ratio, because adding an index can affect write performance if the application writes too much and reads too little

  3. Build indexes where the index base is high (e.g. email, username, not gender)

  4. The $OR query is a concatenation of two independent queries, which is not as efficient as using $IN

  5. The $ne or $nin operation is invalid on the index

  6. When designing multiple field indexes, use the exact match query first, and then use the range match (for example, y>10&&y<100)