Cause: MongoDB is used in the development process, because some associations will use the aggregate instruction of MongoDB. However, when the sequence of aggregate instruction is changed, the query results are inconsistent, leading to the problem of not checking. Therefore, through the analysis of MongoDB’s execution plan, let’s have a look at the execution process of Aggregate and the difference between it and find

The query statement is as follows:

db.classifiedOperationLog.aggregate([
	{$sort: {createDate: -1}},
	{$skip:0},
	{$limit:5},
	{$project: {_id:1.createDate:1}}]);Copy the code

The find directive

We write the following statement with the find directive:

db.classifiedOperationLog.find({},{_id:1.createDate:1})
	.sort({createDate: -1})
	.skip(0)
	.limit(5)
Copy the code

The query result is as follows:

{ 
    "_id" : "6CCC129FC8BD4BA1B9F89B053E86112E"."createDate" : ISODate("The 2020-12-24 T11:25:04. 675 + 0800")} {"_id" : "8B325EC1E7DA4AD390EC301EF5012BE0"."createDate" : ISODate("The 2020-12-24 T11:25:00. 176 + 0800")} {"_id" : "498781F2606D4001977BDB8FAE038DF9"."createDate" : ISODate(": the 2020-12-24 T11 were. 748 + 0800")} {"_id" : "FBE41B432313469588CE5A80BAB7F7BB"."createDate" : ISODate("The 2020-12-24 T09:52:27. 219 + 0800")} {"_id" : "37D2D451BC53489E945828559AE1EDC0"."createDate" : ISODate("The 2020-12-24 T08:57:12. 702 + 0800")}Copy the code

Let’s look at the find execution plan and execute db.collection.explain():

db.classifiedOperationLog.find({},{_id:1.createDate:1})
	.sort({createDate: -1})
	.skip(0)
	.limit(5)
	.explain()
Copy the code
By default, Explain consists of two sections, queryPlanner and serverInfo
// If executionStats or allPlansExecution is used, the executionStats message is also returned
{ 
    "queryPlanner" : {
        "plannerVersion" : 1.0.// Query the schedule version
        "namespace" : "xxx.classifiedOperationLog".// The queried object
        "indexFilterSet" : false.// Whether to use index filtering
        "parsedQuery" : { 							// Parse the query, i.e., what are the filter criteria

        }, 
        "winningPlan" : {							// Best execution plan
            "stage" : "LIMIT".// Use limit to limit the number of returns
            "limitAmount" : 5.0./ / limit restrictions
            "inputStage" : {
                "stage" : "PROJECTION".// Use skip to skip
                "transformBy" : {						// Field filtering
                    "_id" : 1.0."createDate" : 1.0
                }, 
                "inputStage" : {
                    "stage" : "FETCH".// Check out the document
                    "inputStage" : {
                        "stage" : "IXSCAN".// index scan, create date add index, where sort is index
                        "keyPattern" : {
                            "createDate" : 1.0
                        }, 
                        "indexName" : "createDate_1".// Index name
                        "isMultiKey" : false.// Whether to compound index
                        "multiKeyPaths" : {
                            "createDate": []},"isUnique" : false."isSparse" : false."isPartial" : false."indexVersion" : 2.0."direction" : "backward"."indexBounds" : {
                            "createDate" : [
                                "[MaxKey, MinKey]"}}}}},"rejectedPlans" : [							// Reject execution plan, not here]},"serverInfo" : {								// Server information, including host name, port, version, etc
        "host" : "node-0"."port" : 28000.0."version" : "3.6.8"."gitVersion" : "6bc9ed599c3fa164703346a22bad17e33fa913e4"
    }, 
    "ok" : 1.0."operationTime" : Timestamp(1608789303, 1)}Copy the code

The common operations of stage are described as follows:

  • COLLSCAN Collection scan
  • IXSCAN Index scan
  • FETCH checks out the document
  • SHARD_MERGE Result of merging fragments
  • SHARDING_FILTER shards to filter isolated documents
  • LIMIT Use LIMIT to LIMIT the number of returns
  • PROJECTION is skipped using skip
  • IDHACK Queries the id
  • COUNT uses things like db.col.explain ().count() to perform COUNT operations
  • COUNTSCAN count Stage return for count without Index
  • COUNT_SCAN count uses the stage return of Index for count
  • SUBPLA does not use the stage return of the indexed $or query
  • TEXT The stage return from a query using a full-text index
  • PROJECTION limits the return of stage when a field is returned

2. The Aggregate instruction

Before we look at aggregate queries, let’s look at an interesting phenomenon. To demonstrate the effect, let’s change skip to 2 and limit to 3

Find query result

db.classifiedOperationLog.find({},{_id:1.createDate:1})
	.sort({createDate: -1})
	.skip(2)
	.limit(3)
Copy the code
{ 
    "_id" : "498781F2606D4001977BDB8FAE038DF9"."createDate" : ISODate(": the 2020-12-24 T11 were. 748 + 0800")} {"_id" : "FBE41B432313469588CE5A80BAB7F7BB"."createDate" : ISODate("The 2020-12-24 T09:52:27. 219 + 0800")} {"_id" : "37D2D451BC53489E945828559AE1EDC0"."createDate" : ISODate("The 2020-12-24 T08:57:12. 702 + 0800")}Copy the code

Aggregate after engraving

db.classifiedOperationLog.aggregate([
	{$sort: {createDate: -1}},
	{$skip:2},
	{$limit:3},
	{$project: {_id:1.createDate:1}}]);Copy the code
{ 
    "_id" : "498781F2606D4001977BDB8FAE038DF9"."createDate" : ISODate(": the 2020-12-24 T11 were. 748 + 0800")} {"_id" : "FBE41B432313469588CE5A80BAB7F7BB"."createDate" : ISODate("The 2020-12-24 T09:52:27. 219 + 0800")} {"_id" : "37D2D451BC53489E945828559AE1EDC0"."createDate" : ISODate("The 2020-12-24 T08:57:12. 702 + 0800")}Copy the code

It can be seen that the results obtained by aggregate query are consistent with find results. At this point, we move sort∗∗ down to ∗∗sort** down to **sort∗∗ down to after ∗∗limit

db.classifiedOperationLog.aggregate([
	{$skip:2},
	{$limit:3},
	{$sort: {createDate: -1}},
	{$project: {_id:1.createDate:1}}]);Copy the code
{ 
    "_id" : "A3D65CCB5F7144F080B9B972A9595F04"."createDate" : ISODate("The 2020-09-22 T20:57:41. 260 + 0800")} {"_id" : "7C7F4D28F2794B3CB4E361ACAF797646"."createDate" : ISODate("The 2020-09-22 T20: state. 702 + 0800")} {"_id" : "4344982784CE454B83D73764986F65E1"."createDate" : ISODate("The 2020-09-22 T20: there. 409 + 0800")}Copy the code

As you can see, there are still 3 pieces of data in reverse order, but with different time and IDS, the result is worse, which seems not what we want. At this point, we move skip∗∗ to ∗skip** to **skip∗∗ below the ∗∗limit

db.classifiedOperationLog.aggregate([	
	{$limit:3},
	{$skip:2},
	{$sort: {createDate: -1}},
	{$project: {_id:1.createDate:1}}]);Copy the code
{ 
    "_id" : "4344982784CE454B83D73764986F65E1"."createDate" : ISODate("The 2020-09-22 T20: there. 409 + 0800")}Copy the code

As the data becomes a single line and is not in the expected query result, we move skip∗∗ to ∗skip** to **skip∗∗ to below ∗∗sort

db.classifiedOperationLog.aggregate([	
	{$limit:3},
	{$sort: {createDate: -1}},
	{$skip:2},
	{$project: {_id:1.createDate:1}}]);Copy the code
{ 
    "_id" : "EA42C914214C4C18A6B788F897C5F29A"
}
Copy the code

The data changed again and again, but as we tried, the pattern became clearer and clearer, so let’s look at the aggregate execution plan

db.classifiedOperationLog.aggregate([
	{$sort: {createDate: -1}},
	{$skip:2},
	{$limit:3},
	{$project: {_id:1.createDate:1}}, {explain:true});								// Pay attention to the execution plan parameters
Copy the code
{ 
    "stages" : [								// Query steps
        {
            "$cursor" : {							// select * from (select * from (select * from));
                "query" : {							// Query parameters are not available here

                }, 
                "sort" : {
                    "createDate" : NumberInt(- 1)		                / / sorting
                }, 
                "limit" : NumberLong(5), 				        Skip +limit = skip+limit
                "fields" : {							// $project is implemented in step 3, which should be optimized to reduce network IO. Check the implementation of MongoDB for further details
                    "createDate" : NumberInt(1),                                
                    "_id" : NumberInt(1)},"queryPlanner" : {						// This is similar to find
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "xx.classifiedOperationLog"."indexFilterSet" : false."parsedQuery": {},"winningPlan" : {					
                        "stage" : "FETCH"."inputStage" : {
                            "stage" : "IXSCAN".// Index scan is performed with find
                            "keyPattern" : {
                                "createDate" : NumberInt(1)},"indexName" : "createDate_1"."isMultiKey" : false."multiKeyPaths" : {
                                "createDate": []},"isUnique" : false."isSparse" : false."isPartial" : false."indexVersion" : NumberInt(2), 
                            "direction" : "backward"."indexBounds" : {
                                "createDate" : [
                                    "[MaxKey, MinKey]"]}}},"rejectedPlans": []}}}, {"$skip" : NumberLong(2)				                  //2
        }, 
        {
            "$project" : {							  //3
                "_id" : true."createDate" : true}}]."ok" : 1.0."operationTime" : Timestamp(1608792723, 3)."$gleStats" : {
        "lastOpTime" : Timestamp(0, 0)."electionId" : ObjectId("7fffffff0000000000000002")},"$configServerState" : {
        "opTime" : {
            "ts" : Timestamp(1608792719, 3)."t" : NumberLong(1)}},"$clusterTime" : {
        "clusterTime" : Timestamp(1608792723, 3)."signature" : {
            "hash" : BinData(0."AAAAAAAAAAAAAAAAAAAAAAAAAAA="), 
            "keyId" : NumberLong(0)}}}Copy the code

Aggregate is actually the aggregation pipeline of MongoDB, and project∗∗, ∗∗project**, **project∗∗, ∗∗limit, and $sort are all pipe operators. MongoDB’s aggregation pipeline passes MongoDB documents to the next pipeline after one pipeline has finished processing, and the pipeline operation can be repeated.

Ps: Blog before the group to count again with two groups to achieve, its implementation principle is the pipeline repeat. MONGODB03 – Grouping count/grouping de-count (spring-data-mongodb)

Several operators commonly used in aggregation frameworks:

  • $project: Modifies the structure of the input document. It can be used to rename, add, or delete domains, create computed results, and nest documents.
  • Match: Filters data and outputs only the documents that meet the conditions. Match: Filters data and outputs only the documents that meet the conditions. Match: Filters data and outputs only the documents that meet the conditions. Match uses MongoDB’s standard query operations.
  • $limit: Limits the number of documents returned by the MongoDB aggregation pipeline.
  • $skip: Skips a specified number of documents in the aggregation pipe and returns the remaining documents.
  • $unwind: Unwinds an array type field in a document into multiple pieces, each containing a value from the array.
  • $group: Groups documents in a collection that can be used for statistical results.
  • $sort: Outputs by sorting the input documents.
  • $geoNear: Outputs an ordered document close to a geographic location.

Now that we know how aggregate works, let’s change the pipe sequence and take a look at the new execution plan

db.classifiedOperationLog.aggregate([
	{$skip:2},
	{$limit:3},
	{$sort: {createDate: -1}},
	{$project: {_id:1.createDate:1}}, {explain:true});
Copy the code
{ 
    "stages": [{"$cursor" : {						Run skip+limit to select 5 items of data
                "query": {},"limit" : NumberLong(5), 	  		
                "fields" : {
                    "createDate" : NumberInt(1), 
                    "_id" : NumberInt(1)},"queryPlanner" : {					
                    "plannerVersion" : NumberInt(1), 
                    "namespace" : "xxx.classifiedOperationLog"."indexFilterSet" : false."parsedQuery": {},"winningPlan" : {
                        "stage" : "COLLSCAN".// Set scan to retrieve the limit number document
                        "direction" : "forward"
                    }, 
                    "rejectedPlans": []}}}, {"$skip" : NumberLong(2)					//2
        }, 
        {
            "$sort" : {							//3. Sort the results
                "sortKey" : {
                    "createDate" : NumberInt(- 1}}}, {"$project" : {						//4
                "_id" : true."createDate" : true}}]."ok" : 1.0."operationTime" : Timestamp(16087943031),"$gleStats" : {
        "lastOpTime" : Timestamp(0, 0)."electionId" : ObjectId("7fffffff0000000000000002")},"$configServerState" : {
        "opTime" : {
            "ts" : Timestamp(1608794306, 3)."t" : NumberLong(1)}},"$clusterTime" : {
        "clusterTime" : Timestamp(1608794306And 4),"signature" : {
            "hash" : BinData(0."AAAAAAAAAAAAAAAAAAAAAAAAAAA="), 
            "keyId" : NumberLong(0)}}}Copy the code

Through the analysis of the above execution plan, the working mechanism of find and aggregate can be understood. Partners can select corresponding instructions as required. Aggregate can meet the requirements of some specific scenarios through the sequence and reuse of parameters.

Reference links:

www.runoob.com/mongodb/mon…

Blog.csdn.net/user_longli…

Docs.mongodb.com/manual/aggr…