Author | LAKSHAY ARORA compile | Flin source | analyticsvidhya

introduce

With the spread of the Internet, we are now generating data at an unprecedented rate. Because performing any type of analysis requires us to collect/query the necessary data from the database, choosing the right tool to query the data becomes critical. Therefore, it is impossible to imagine using SQL to process such a large amount of data because of the high cost of each query.

This is where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. In addition, MongoDB can process large amounts of data very efficiently and is the most widely used NoSQL database because it provides a rich query language and flexible and fast access to data.

In this article, we’ll look at several examples of how to query MongoDB databases using PyMongo. In addition, we’ll see how to use comparison and logical operators, regular expressions, and the basics of aggregation pipelines.

This article is directing a beginners tutorial (www.analyticsvidhya.com/blog/2020/0… “, where we discussed unstructured databases, installation steps, and the challenges of basic MongoDB operations. Therefore, if you are new to MongoDB, I recommend reading this article first.

directory

  1. What is PyMongo?

  2. Installation steps

  3. Insert data into the database

  4. Querying the database

    1. Filtering by field
    2. Filter by comparison operator
    3. Filtering based on logical operators
    4. Commonly used expression
    5. Polymerization pipe
  5. endnotes

What is PyMongo?

PyMongo is a Python library that allows us to connect to MongoDB. Furthermore, this is the most recommended method that MongoDB uses with Python.

Additionally, we chose Python to interact with MongoDB because it is one of the most commonly used and powerful languages in data science. PyMongo allows us to retrieve data using dictionary-like syntax.

If you are new to Python, I suggest you take this free course: Getting Started with Python.

  • Courses.analyticsvidhya.com/courses/int…

Installation steps

Installing PyMongo is straightforward. Here, I’m assuming you have Python 3 and MongoDB installed. The following command will help you install PyMongo:

pip3 install pymongo
Copy the code

Insert data into the database

Now let’s set it up and use PyMongo to query the MongoDB database. First, we insert the data into the database. The following steps will help you

  1. Import the library and connect to the Mongo client

Start the MongoDB server on your computer. I assume it is running the file at localhost:27017.

Let’s start importing some of the libraries we’ll be using. By default, the MongoDB server runs on port 27017 on the local computer. We will then use the PyMongo library to connect to the MongoDB client.

Then get the database instance of database SAMple_DB. In case it doesn’t exist, MongoDB will create one for you.

Import the required libraries
import pymongo
import pprint
import json
import warnings
warnings.filterwarnings('ignore')

Connect to MongoClient
client = pymongo.MongoClient('mongodb://localhost:27017')

Get database
database = client['sample_db']
Copy the code
  1. Create a collection from a JSON file

We’ll use data from a food delivery company that operates in multiple cities. In addition, they have various distribution centers in these cities that are used to send orders to their customers. You can download the data and code here.

  • Drive.google.com/drive/folde…
  1. Weekly_demand:

    • Id: The unique ID of each document
    • Week week: number
    • Center_id: indicates the unique ID of the distribution center
    • Meal_id: indicates the unique ID of a meal
    • Checkout_price: Final price, including discounts, taxes, and shipping charges
    • Base_price: basic meal price
    • Emailer_for_promotion: Send emails to promote meals
    • Homepage_featured: Meals served on the homepage
    • Num_orders :(target) order number
  2. Meal_info:

    • Meal_id: indicates the unique ID of a meal
    • Category: type of meal (drinks/snacks/soups…)
    • Cuisine: Indian/Italian /…

We will then create two collections in the sample_DB database:

Create a weekly requirements collection
database.create_collection("weekly_demand")

# Create meal information
database.create_collection("meal_info")
Copy the code

  1. Inserts data into the collection

Now, we have the data in JSON format. Then, we’ll get an instance of the collection, read the data file, and use the insert_many function to insert the data.

# get collection weekly_demand
weekly_demand_collection = database.get_collection("weekly_demand")

# open the weekly_demand JSON file
with open("weekly_demand.json") as f:
    file_data = json.load(f)
Insert data into the collection
weekly_demand_collection.insert_many(file_data)

Get total data points
weekly_demand_collection.find().count()
# > > 456548

# Get collectible meals
meal_info_collection = database.get_collection("meal_info")

Open the meat_info JSON file
with open("meal_info.json") as f:
    file_data = json.load(f)
    
Insert data into the collection
meal_info_collection.insert_many(file_data)

Get total data points
meal_info_collection.find().count()
# > > 51
Copy the code

Finally, there are 456,548 documents in the weekly_DEMANd_collection and 51 documents in the meal information collection. Now, let’s take a look at one document in each collection.

weekly_demand_collection

weekly_demand_collection.find_one()
Copy the code

Dietary information set

meal_info_collection.find_one()
Copy the code

Now, our data is ready. Let’s continue querying the database.

Querying the database

We can query the MongoDB database with PyMonfo with lookup functionality to get all the results that satisfy a given condition, or we can use the find_one function, which will return only one result that satisfies the condition.

Here is the syntax for find and find_one:

your_collection.find( {<< query >>} , { << fields>>} )
Copy the code

You can query the database using the following filtering techniques

  1. Filtering by field

For example, you have hundreds of fields and you only want to see a few of them. You can do this by setting all required field names to the value 1. For example,

weekly_demand_collection.find_one( {}, { "week": 1."checkout_price" : 1})
Copy the code

On the other hand, if you just want to discard some fields from the entire document, you can set the field name to be equal to 0. Therefore, only those fields will be excluded. Note that you cannot use a combination of 1s and 0s to get fields. It’s either all one or all zero.

weekly_demand_collection.find_one( {}, {"num_orders" : 0."meal_id" : 0})
Copy the code

  1. Filter conditions

Now, in this section, we will provide a condition in the first brace and remove the field in the second. Therefore, it will return the first document with center_id equal 55 and meal_id equal 1885, and it will also discard fields _id and week.

weekly_demand_collection.find_one( {"center_id" : 55."meal_id" : 1885}, {"_id" : 0."week" : 0})Copy the code

  1. Filter by comparison operator

Here are nine comparison operators in MongoDB.

The name of the describe
$eq It matches values equal to the specified value.
$gt It matches values greater than the specified value.
$gte It matches all values greater than or equal to the specified value
$in It will match any value specified in the array
$lt It matches all values less than the specified value
$lte It matches all values that are less than or equal to the specified value
$ne It matches all values that are not equal to the specified value
$nin It will not match any of the values specified in the array

Here are some examples of using these comparison operators

  1. Equals and does not equal

We will find all documents with center_id equal to 55 and homepage_Featured not equal to 0. Because we will be using the find function, it will return a cursor for the command. In addition, the for loop is used to iterate over the query results.

result_1 = weekly_demand_collection.find({
    "center_id" : { "$eq" : 55},
    "homepage_featured" : { "$ne" : 0}})for i in result_1:
    print(i)
Copy the code

  1. In and out of the list

For example, you need to match one element with multiple elements. In this case, we can use the IN operator instead of using the IN operator multiple times, instead of using the EQ operator multiple times. We will try to find all documents with center_id 24 or 11.

result_2 = weekly_demand_collection.find({
    "center_id" : { "$in" : [ 24.11]}})for i in result_2:
    print(i)
Copy the code

We then find all documents that do not have Center_id in the specified list. The following query will return all documents with center_id not 24 or 11.

result_3 = weekly_demand_collection.find({
    "center_id" : { "$nin" : [ 24.11]}})for i in result_3:
    print(i)
Copy the code

  1. Less than and greater than

Now, let’s find all documents with center_id 55 and checkout_price greater than 100 and less than 200. To do this, use the following syntax

result_4 = weekly_demand_collection.find({
    "center_id" : 55."checkout_price" : { "$lt" : 200."$gt" : 100}})for i in result_4:
    print(i)
Copy the code

  1. Filters based on logical operators
The name of the describe
$and It concatenates the query statement with the logic, AND returns all documents that meet both criteria.
$not It reverses the results of the query and returns documents that do not match the query expression.
$nor It uses logic to concatenate query clauses and NOR returns any documents that do not match the clauses.
$or It uses logic to concatenate query clauses, and OR returns all documents that match any clause condition.

The following example illustrates the use of the logical operator –

  1. AND operator

The following query will return documents with center_ID equal 11 and meal number not equal 1778. The subquery of the AND operator will appear in the list.

result_5 = weekly_demand_collection.find({
    "$and" : [{
                 "center_id" : { "$eq" : 11}}, {"meal_id" : { "$ne" : 1778}}}])for i in result_5:
    print(i)
Copy the code

  1. Or operator.

The following query will return all documents with center_ID equal to 11 or with meal ID 1207 or 2707. In addition, the subquery of the OR operator will be in the list.

result_6 = weekly_demand_collection.find({
    "$or" : [{
                 "center_id" : { "$eq" : 11}}, {"meal_id" : { "$in" : [1207.2707]]}}})for i in result_6:
    print(i)
Copy the code

  1. Filter using regular expressions

Regular expressions are useful when you have text fields and want to search for documents with a particular pattern. If you want to learn more about regular expressions, I strongly encourage you to read this tutorial: Python Regular Expressions for Beginners.

  • www.analyticsvidhya.com/blog/2015/0…

It can be used with the $regex operator, and we can provide values for the operator to change the REgex mode to MATC. We will enable the dining information set in this query, and then find the document that begins with C in the gourmet food field.

result_7 = meal_info_collection.find({
    "cuisine" : { "$regex" : "^C"}})for i in result_7:
    print(i)
Copy the code

Let’s look at another example of regular expressions. We will look for all documents that start with “S” and end with “Ian” in all categories.

result_8 = meal_info_collection.find({
    "$and": [{"category" : {
            "$regex" : "^S"
        }},
        {
            "cuisine" : {
                "$regex" : "ian$"}}}])for i in result_8:
    print(i)
Copy the code

  1. Polymerization pipe

MongoDB’s aggregation pipeline provides a framework for performing a series of data transformations on data sets. Here’s the syntax:

your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )
Copy the code

The first phase takes the complete set of documents as input, and then each subsequent phase takes the result set of the previous transformation as input to the next phase and produces the output.

There are about 10 transformations available in the MongoDB summary, and we’ll look at match and match and match and Group in this article. We will discuss each transformation in detail in an upcoming MongoDB article.

For example, in the first phase, we will match documents with center_id equal to 11, and in the next phase, it will count the number of documents with center_id equal to 11. Notice that we have assigned a value to the $count operator equal to total_ROWS in the second phase, which is the name of the field we want to display in the output.

result_9 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11}}},## stage 2
    {
        "$count" : "total_rows"}])for i in result_9:
    print(i)
Copy the code

Now, let’s take another example where the first stage is the same as before, center_id equals 11. In the second stage, we compute the average value of num_Orders for center_id 11 and the unique meal_IDS for center_id 11.

result_10 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11}}},## stage 2
    {
        "$group" : { "_id" : 0 ,
                     "average_num_orders": { "$avg" : "$num_orders"},
                     "unique_meal_id" : {"$addToSet" : "$meal_id"}}}])for i in result_10:
    print(i)
Copy the code

endnotes

Today, there is an incredible amount of data, so there is a need to find better alternatives to querying data. In summary, in this article, we learned how to query MongoDB databases using PyMongo. In addition, we learned how to apply various filters as needed.

If you want to learn more about querying data, I recommend you take the following course — Structured Query Language (SQL) for Data Science

  • Courses.analyticsvidhya.com/courses/str…

In the following articles, we will discuss the aggregation pipeline in detail.

Thanks for reading!

The original link: www.analyticsvidhya.com/blog/2020/0…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/