Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

【 Mongo series 】mongodb learning four, aggregation knowledge comb

What is aggregated data?

Let’s look at aggregate data first

Data Aggregation is the merging of Data from different Data sources. .

Clustering, also known as cluster analysis, is a technique for analyzing statistical data.

It is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

What is an aggregated query?

Aggregate operations process data by recording and returning computed results

Office and action group values come from multiple documents, and you can perform various operations on grouped data to range a single result

Aggregation operations generally include the following three categories:

  • Single action polymerization
  • Polymerization pipe
  • MapReduce

Docs.mongodb.com/manual/aggr…

Single action polymerization

Mongodb itself provides the following aggregation functions with a single function. Compared with aggregation pipes and mapReduce, these single aggregation functions are not flexible and lack rich functions

  • Db.collection name.estimatedDocumentCount ()

A rough calculation of the number of documents is an estimate

  • Db. Set name.count()

Counting the number of documents is done through aggregation

  • Db. Set name. Distinct ()

See what values a field has

Such as:

> db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }

> db.users.distinct("age")[15, 19, 25]Copy the code

In the above example, use db.users.distinct(“age”) to check the values of the age field

Polymerization pipe

Docs.mongodb.com/manual/core…

The aggregate pipe consists of several stages, each of which transforms the file as it passes through a pipe, which we can think of as a Pipe in Linux, where the input of the next instruction is the output of the previous instruction

< span style = "box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important;"

  • pipelines

A set of data aggregation phases, except for $out, $Merge, and $geonear, may occur only once in the pipe. The other operators may occur multiple times in the pipe each phase

  • options

Optional, aggregate other parameters of the operation

This includes query plans, whether to use temporary files, cursors, maximum operation times, read and write policies, mandatory indexes, and so on

Commonly used pipeline polymerization stage

Comb through the commonly used pipeline polymerization stages as follows

Stage key describe
$match filter
$group grouping
$project Display field
$lookup More than a table
$unwind An array of
$out The results are imported into the new table
$count $document count
$sort$skip.$limit Sorting and paging

The other stages we see website docs.mongodb.com/manual/refe…

Take the example of $count

The first $group is used to filter the data, and the output in the aggregate pipe is the input to the next pipe, which is the field that $project selects to display

MapReduce

Docs.mongodb.com/manual/core…

The MapReduce operation takes a large amount of data processing and splits it into parallel threads, then merges the results together

MapReduce has the following two phases:

  • The Map phase that brings together document data with the same key
  • Combine the results of map operations for the Reduce phase of statistical output

Here’s an example from the official website

The emit maps CUST_ID and amount with status:”A” filter and finally puts the results into A new set named ORDER_totals

The syntax of MapReduce operations is as follows:

MapReduce (<map>,<reduce>, {out:<collection>,query:<document>, sort:<document>,limit:<number>, finalize:<function>mscope:<document>, jsMode:<boolean>,verbose:<boolean>, bypassDocumentValidation:<boolean> } )Copy the code
  • map

The data is split into key-value pairs and handed to the Reduce function

  • reduce

Perform statistical operations on values based on keys

  • out

Optionally, import the results into the specified table

  • query

Optional parameter to filter conditions for data that results in a map

  • sort

After the sorting is complete, it is sent to the map

  • limit

Limits the number of documents sent to the map

  • finalize

Optional. Modify the Reduce result and output it

  • scope

Optional, specify global variables of Map, Reduce, and Finalize

  • jsMode

This parameter is optional. The default value is false. Whether to convert data to Bson format during MapReduce

  • verbose

It is an optional argument to display the time in the result. The default is false

  • bypassDocumentValidation

This parameter is optional. The data verification process is skipped

Converged pipeline versus MapReduce

Comparative study Polymerization pipe MapReduce
purpose Used to improve performance and availability of aggregation tasks For processing large data sets, which MapReduce is more convenient when the data is huge
Characteristics of the The pipe operators can be repeated as needed, and the pipe operation does not have to generate an output document for each input document In addition to grouping operations, you can perform complex aggregation tasks and incremental aggregation of growing data sets
flexibility Limited to operators and expressions supported by aggregation pipes Custom Map, Reduce, and Finalize javascript functions provide flexibility and aggregation logic
The output Returns the result as a cursor if the pipe includes one$outOr more$merge“, the cursor is empty Inlining, new collection, merge, replace, shrink, return results with various options
shard Supports both non-sharded and sharded input collections Supports both non-sharded and sharded input collections

More detailed contrast, can view the website docs.mongodb.com/manual/refe…

Welcome to like, follow and favorites

Friends, your support and encouragement, I insist on sharing, improve the quality of the power

All right, that’s it for this time

Technology is open, our mentality, should be more open. Embrace change, live in the sun, and strive to move forward.

I am Nezha, welcome to like, see you next time ~