Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

【 Mongo series 】mongodb learning four, aggregation knowledge comb

What is aggregated data?

Let’s look at aggregate data first

Data Aggregation is the merging of Data from different Data sources. .

Clustering, also known as cluster analysis, is a technique for analyzing statistical data.

It is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

What is an aggregated query?

Aggregate operations process data by recording and returning computed results

Office and action group values come from multiple documents, and you can perform various operations on grouped data to range a single result

Aggregation operations generally include the following three categories:

Single action polymerization
Polymerization pipe
MapReduce

Docs.mongodb.com/manual/aggr…

Single action polymerization

Mongodb itself provides the following aggregation functions with a single function. Compared with aggregation pipes and mapReduce, these single aggregation functions are not flexible and lack rich functions

Db.collection name.estimatedDocumentCount ()

A rough calculation of the number of documents is an estimate

Db. Set name.count()

Counting the number of documents is done through aggregation

Db. Set name. Distinct ()

See what values a field has

Such as:

> db.users.find()
{ "_id" : ObjectId("61584aeeee74dfe04dac57e9"), "name" : "xiaokeai", "age" : 25, "hobby" : "reading", "infos" : { "tall" : 175, "height" : 62 }, "school" : "cs" }
{ "_id" : ObjectId("615a56d6bc6afecd2cff8f96"), "name" : "xiaozhu", "age" : 15, "hobby" : "basketball", "infos" : { "tall" : 190, "height" : 70 }, "school" : "sh" }
{ "_id" : ObjectId("615a5856d988690b07c69f64"), "name" : "xiaopang" }
{ "_id" : ObjectId("615a5917d988690b07c69f66"), "name" : "nancy", "age" : 25, "hobby" : "study", "infos" : { "tall" : 175, "height" : 60 }, "school" : "hn" }
{ "_id" : ObjectId("615a5917d988690b07c69f67"), "name" : "job", "age" : 19, "hobby" : "basketball", "infos" : { "tall" : 170, "height" : 70 }, "school" : "nj" }

> db.users.distinct("age")[15, 19, 25]Copy the code

In the above example, use db.users.distinct(“age”) to check the values of the age field

Polymerization pipe

Docs.mongodb.com/manual/core…

The aggregate pipe consists of several stages, each of which transforms the file as it passes through a pipe, which we can think of as a Pipe in Linux, where the input of the next instruction is the output of the previous instruction

< span style = "box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important;"

pipelines

A set of data aggregation phases, except for $out, $Merge, and $geonear, may occur only once in the pipe. The other operators may occur multiple times in the pipe each phase

options

Optional, aggregate other parameters of the operation

This includes query plans, whether to use temporary files, cursors, maximum operation times, read and write policies, mandatory indexes, and so on

Commonly used pipeline polymerization stage

Comb through the commonly used pipeline polymerization stages as follows

Stage key	describe
$match	filter
$group	grouping
$project	Display field
$lookup	More than a table
$unwind	An array of
$out	The results are imported into the new table
$count	$document count
`$sort` ，`$skip`.`$limit`	Sorting and paging

The other stages we see website docs.mongodb.com/manual/refe…

Take the example of $count

The first $group is used to filter the data, and the output in the aggregate pipe is the input to the next pipe, which is the field that $project selects to display

MapReduce

Docs.mongodb.com/manual/core…

The MapReduce operation takes a large amount of data processing and splits it into parallel threads, then merges the results together

MapReduce has the following two phases:

The Map phase that brings together document data with the same key
Combine the results of map operations for the Reduce phase of statistical output

Here’s an example from the official website

The emit maps CUST_ID and amount with status:”A” filter and finally puts the results into A new set named ORDER_totals

The syntax of MapReduce operations is as follows:

MapReduce (<map>,<reduce>, {out:<collection>,query:<document>, sort:<document>,limit:<number>, finalize:<function>mscope:<document>, jsMode:<boolean>,verbose:<boolean>, bypassDocumentValidation:<boolean> } )Copy the code

The data is split into key-value pairs and handed to the Reduce function

reduce

Perform statistical operations on values based on keys

Optionally, import the results into the specified table

query

Optional parameter to filter conditions for data that results in a map

sort

After the sorting is complete, it is sent to the map

limit

Limits the number of documents sent to the map

finalize

Optional. Modify the Reduce result and output it

scope

Optional, specify global variables of Map, Reduce, and Finalize

jsMode

This parameter is optional. The default value is false. Whether to convert data to Bson format during MapReduce

verbose

It is an optional argument to display the time in the result. The default is false

bypassDocumentValidation

This parameter is optional. The data verification process is skipped

Converged pipeline versus MapReduce

Comparative study	Polymerization pipe	MapReduce
purpose	Used to improve performance and availability of aggregation tasks	For processing large data sets, which MapReduce is more convenient when the data is huge
Characteristics of the	The pipe operators can be repeated as needed, and the pipe operation does not have to generate an output document for each input document	In addition to grouping operations, you can perform complex aggregation tasks and incremental aggregation of growing data sets
flexibility	Limited to operators and expressions supported by aggregation pipes	Custom Map, Reduce, and Finalize javascript functions provide flexibility and aggregation logic
The output	Returns the result as a cursor if the pipe includes one`$out`Or more`$merge`“, the cursor is empty	Inlining, new collection, merge, replace, shrink, return results with various options
shard	Supports both non-sharded and sharded input collections	Supports both non-sharded and sharded input collections

More detailed contrast, can view the website docs.mongodb.com/manual/refe…

Welcome to like, follow and favorites

Friends, your support and encouragement, I insist on sharing, improve the quality of the power

All right, that’s it for this time

Technology is open, our mentality, should be more open. Embrace change, live in the sun, and strive to move forward.

I am Nezha, welcome to like, see you next time ~

【 Mongo series 】mongodb learning four, aggregation knowledge comb

【 Mongo series 】mongodb learning four, aggregation knowledge comb

What is aggregated data?

What is an aggregated query?

Single action polymerization

Polymerization pipe

Commonly used pipeline polymerization stage

MapReduce

Converged pipeline versus MapReduce

Welcome to like, follow and favorites

Related Posts

10+ years as a database development engineer with an in-depth understanding of MySQL indexes

Gracefully turn off thread pools

Java Application performance tuning is easy with this visual tool…