Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”
This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
【 Mongo series 】mongodb learning four, aggregation knowledge comb
What is aggregated data?
Let’s look at aggregate data first
Data Aggregation is the merging of Data from different Data sources. .
Clustering, also known as cluster analysis, is a technique for analyzing statistical data.
It is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
What is an aggregated query?
Aggregate operations process data by recording and returning computed results
Office and action group values come from multiple documents, and you can perform various operations on grouped data to range a single result
Aggregation operations generally include the following three categories:
- Single action polymerization
- Polymerization pipe
- MapReduce
Docs.mongodb.com/manual/aggr…
Single action polymerization
Mongodb itself provides the following aggregation functions with a single function. Compared with aggregation pipes and mapReduce, these single aggregation functions are not flexible and lack rich functions
- Db.collection name.estimatedDocumentCount ()
A rough calculation of the number of documents is an estimate
- Db. Set name.count()
Counting the number of documents is done through aggregation
- Db. Set name. Distinct ()
See what values a field has
Such as:
> db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }
> db.users.distinct("age")[15, 19, 25]Copy the code
In the above example, use db.users.distinct(“age”) to check the values of the age field
Polymerization pipe
Docs.mongodb.com/manual/core…
The aggregate pipe consists of several stages, each of which transforms the file as it passes through a pipe, which we can think of as a Pipe in Linux, where the input of the next instruction is the output of the previous instruction
< span style = "box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important;"
- pipelines
A set of data aggregation phases, except for $out, $Merge, and $geonear, may occur only once in the pipe. The other operators may occur multiple times in the pipe each phase
- options
Optional, aggregate other parameters of the operation
This includes query plans, whether to use temporary files, cursors, maximum operation times, read and write policies, mandatory indexes, and so on
Commonly used pipeline polymerization stage
Comb through the commonly used pipeline polymerization stages as follows
Stage key | describe |
---|---|
$match | filter |
$group | grouping |
$project | Display field |
$lookup | More than a table |
$unwind | An array of |
$out | The results are imported into the new table |
$count | $document count |
$sort ,$skip .$limit |
Sorting and paging |
The other stages we see website docs.mongodb.com/manual/refe…
Take the example of $count
The first $group is used to filter the data, and the output in the aggregate pipe is the input to the next pipe, which is the field that $project selects to display
MapReduce
Docs.mongodb.com/manual/core…
The MapReduce operation takes a large amount of data processing and splits it into parallel threads, then merges the results together
MapReduce has the following two phases:
- The Map phase that brings together document data with the same key
- Combine the results of map operations for the Reduce phase of statistical output
Here’s an example from the official website
The emit maps CUST_ID and amount with status:”A” filter and finally puts the results into A new set named ORDER_totals
The syntax of MapReduce operations is as follows:
MapReduce (<map>,<reduce>, {out:<collection>,query:<document>, sort:<document>,limit:<number>, finalize:<function>mscope:<document>, jsMode:<boolean>,verbose:<boolean>, bypassDocumentValidation:<boolean> } )Copy the code
- map
The data is split into key-value pairs and handed to the Reduce function
- reduce
Perform statistical operations on values based on keys
- out
Optionally, import the results into the specified table
- query
Optional parameter to filter conditions for data that results in a map
- sort
After the sorting is complete, it is sent to the map
- limit
Limits the number of documents sent to the map
- finalize
Optional. Modify the Reduce result and output it
- scope
Optional, specify global variables of Map, Reduce, and Finalize
- jsMode
This parameter is optional. The default value is false. Whether to convert data to Bson format during MapReduce
- verbose
It is an optional argument to display the time in the result. The default is false
- bypassDocumentValidation
This parameter is optional. The data verification process is skipped
Converged pipeline versus MapReduce
Comparative study | Polymerization pipe | MapReduce |
---|---|---|
purpose | Used to improve performance and availability of aggregation tasks | For processing large data sets, which MapReduce is more convenient when the data is huge |
Characteristics of the | The pipe operators can be repeated as needed, and the pipe operation does not have to generate an output document for each input document | In addition to grouping operations, you can perform complex aggregation tasks and incremental aggregation of growing data sets |
flexibility | Limited to operators and expressions supported by aggregation pipes | Custom Map, Reduce, and Finalize javascript functions provide flexibility and aggregation logic |
The output | Returns the result as a cursor if the pipe includes one$out Or more$merge “, the cursor is empty |
Inlining, new collection, merge, replace, shrink, return results with various options |
shard | Supports both non-sharded and sharded input collections | Supports both non-sharded and sharded input collections |
More detailed contrast, can view the website docs.mongodb.com/manual/refe…
Welcome to like, follow and favorites
Friends, your support and encouragement, I insist on sharing, improve the quality of the power
All right, that’s it for this time
Technology is open, our mentality, should be more open. Embrace change, live in the sun, and strive to move forward.
I am Nezha, welcome to like, see you next time ~