MongoDB is a non-relational database written by C++ language. It is an open source database system based on distributed file storage. Its content storage form is similar to JSON object, and its field values can contain other documents, arrays and document arrays, which is very flexible. In this section, we’ll look at MongoDB’s storage operations in Python 3.
1. Preparation
Before you start, make sure you have MongoDB installed and started, and Python’s PyMongo library installed.
2. Connect the mongo
To connect to MongoDB, we need to use MongoClient from the PyMongo library. The first argument is host, and the second argument is port (default: 27017 if you don’t pass an argument to it) :
import pymongo
client = pymongo.MongoClient(host='localhost', port=27017)Copy the code
This allows you to create the MongoDB connection object.
In addition, the MongoClient first parameter host can also be passed directly into MongoDB connection string, which begins with MongoDB, for example:
client = MongoClient('mongodb://localhost:27017/')Copy the code
This can achieve the same connection effect.
3. Specify the database
Multiple databases can be created in MongoDB, and next we need to specify which database to operate on. In this example, we use the test database as an example. The next step is to specify the database to be used in the program:
db = client.testCopy the code
The test database is returned by calling the test property of the client. Of course, we can also specify:
db = client['test']Copy the code
These two ways are equivalent.
4. Specify a collection
Each MongoDB database in turn contains a number of collections, which are similar to tables in a relational database.
The next step is to specify the collection to operate on, in this case, specifying a collection named Students. Like specifying a database, there are two ways to specify a collection:
collection = db.studentsCopy the code
collection = db['students']Copy the code
We have declared a Collection object.
5. Insert data
Next, you are ready to insert data. For the collection of students, create a new piece of student data in the form of a dictionary:
student = {
'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'
}Copy the code
This specifies the student’s student number, name, age, and gender. Next, insert data directly by calling collection’s insert() method as follows:
result = collection.insert(student)
print(result)Copy the code
In MongoDB, each piece of data actually has an _ID attribute to uniquely identify it. If this attribute is not explicitly specified, MongoDB automatically generates an _id attribute of type ObjectId. The insert() method returns the _id value after execution.
The running results are as follows:
5932a68615c2606814c91f3dCopy the code
Of course, we can also insert multiple pieces of data at the same time by passing them in a list, as shown in the following example:
student1 = {
'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'
}
student2 = {
'id': '20170202'.'name': 'Mike'.'age': 21.'gender': 'male'
}
result = collection.insert([student1, student2])
print(result)Copy the code
The return result is a set of the corresponding _id:
[ObjectId('5932a80115c2606a59e8a048'), ObjectId('5932a80115c2606a59e8a049')]Copy the code
In fact, the insert() method is no longer officially recommended in PyMongo 3.x. Of course, there’s nothing wrong with continuing to use it. It is officially recommended to use the insert_one() and insert_many() methods to insert a single record and multiple records, respectively, as shown in the following example:
student = {
'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'
}
result = collection.insert_one(student)
print(result)
print(result.inserted_id)Copy the code
The running results are as follows:
<pymongo.results.InsertOneResult object at 0x10d68b558>
5932ab0f15c2606f0c1cf6c5Copy the code
Unlike the insert() method, this time we return an InsertOneResult object, and we can call its inserted_id attribute to get the _id.
For the insert_many() method, we can pass the data as a list, as shown in the following example:
student1 = {
'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'
}
student2 = {
'id': '20170202'.'name': 'Mike'.'age': 21.'gender': 'male'
}
result = collection.insert_many([student1, student2])
print(result)
print(result.inserted_ids)Copy the code
The running results are as follows:
<pymongo.results.InsertManyResult object at 0x101dea558>
[ObjectId('5932abf415c2607083d3b2ac'), ObjectId('5932abf415c2607083d3b2ad')]Copy the code
The type returned by this method is InsertManyResult, and the inserted_IDS attribute is called to get the _id list of the inserted data.
6. The query
After inserting the data, we can perform a query using find_one() or find(), where find_one() returns a single result, and find() returns a generator object. The following is an example:
result = collection.find_one({'name': 'Mike'})
print(type(result))
print(result)Copy the code
Here we query the data with name Mike, it returns the dictionary type, and the result is as follows:
<class 'dict'>
{'_id': ObjectId('5932a80115c2606a59e8a049'), 'id': '20170202'.'name': 'Mike'.'age': 21.'gender': 'male'}Copy the code
As you can see, it has an extra _id attribute, which MongoDB adds automatically during the insert process.
In addition, we can also query based on ObjectId, which needs to be used in the BSON library:
from bson.objectid import ObjectId
result = collection.find_one({'_id': ObjectId('593278c115c2602667ec6bae')})
print(result)Copy the code
The query result is still the dictionary type, as follows:
{'_id': ObjectId('593278c115c2602667ec6bae'), 'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'}Copy the code
Of course, if the query result does not exist, None is returned.
For multiple data queries, we can use the find() method. For example, here looks for data with an age of 20 as shown in the following example:
results = collection.find({'age': 20})
print(results)
for result in results:
print(result)Copy the code
The running results are as follows:
<pymongo.cursor.Cursor object at 0x1032d5128>
{'_id': ObjectId('593278c115c2602667ec6bae'), 'id': '20170101'.'name': 'Jordan'.'age': 20.'gender': 'male'}
{'_id': ObjectId('593278c815c2602678bb2b8d'), 'id': '20170102'.'name': 'Kevin'.'age': 20.'gender': 'male'}
{'_id': ObjectId('593278d815c260269d7645a8'), 'id': '20170103'.'name': 'Harden'.'age': 20.'gender': 'male'}Copy the code
The return result is of type Cursor, which is equivalent to a generator. We need to iterate over all the results, each of which is of type dictionary.
If you want to query data older than 20, write as follows:
results = collection.find({'age': {'$gt': 20}})Copy the code
The key of the query is not just a number, but a dictionary with a key named $gt, which means greater than, and a key value of 20.
The comparison symbols are summarized in the following table.
symbol | meaning | The sample |
---|---|---|
$lt |
Less than | {'age': {'$lt': 20}} |
$gt |
Is greater than | {'age': {'$gt': 20}} |
$lte |
Less than or equal to | {'age': {'$lte': 20}} |
$gte |
Greater than or equal to | {'age': {'$gte': 20}} |
$ne |
Is not equal to | {'age': {'$ne': 20}} |
$in |
Within the scope of | {'age': {'$in': [20, 23]}} |
$nin |
Out of range | {'age': {'$nin': [20, 23]}} |
In addition, regular match queries can be performed. For example, query student data whose name starts with M as shown in the following example:
results = collection.find({'name': {'$regex': '^M.*'}})Copy the code
$regex is used here to specify regular matches, and ^M.* represents regular expressions beginning with M.
Some functional symbols are regrouped in the following table.
symbol | meaning | The sample | The sample mean |
---|---|---|---|
$regex |
Matching regular expressions | {'name': {'$regex': '^M.*'}} |
name Starting with M |
$exists |
Whether the attribute exists | {'name': {'$exists': True}} |
name Attribute exists |
$type |
Type judgment | {'age': {'$type': 'int'}} |
age The type ofint |
$mod |
Digital mode operation | {'age': {'$mod': [5, 0]}} |
Mod 5 has 0 remaining |
$text |
Text query | {'$text': {'$search': 'Mike'}} |
text Type is contained in theMike string |
$where |
Advanced Search | {'$where': 'obj.fans_count == obj.follows_count'} |
The number of followers equals the number of attention |
More detailed usage of these operations can be found in directing the official document: https://docs.mongodb.com/manual/reference/operator/query/.
7. Count
To count the number of pieces of data in the result of a query, call the count() method. For example, count all the data items:
count = collection.find().count()
print(count)Copy the code
Or statistics that meet certain conditions:
count = collection.find({'age': 20}).count()
print(count)Copy the code
The result is a number, that is, the number of pieces of data that meet the condition.
Sort 8.
To sort, you simply call the sort() method, passing in the sorted fields and the ascending order flag. The following is an example:
results = collection.find().sort('name', pymongo.ASCENDING)
print([result['name'] for result in results])Copy the code
The running results are as follows:
['Harden'.'Jordan'.'Kevin'.'Mark'.'Mike']Copy the code
Here we call PyMongo.ascending to specify the ASCENDING order. To DESCENDING order, pass Pymongo. DESCENDING.
9. The offset
In some cases, we may want to take only a few elements, then skip() method can be used to offset several positions, such as offset 2, ignore the first two elements, get the third and later elements:
results = collection.find().sort('name', pymongo.ASCENDING).skip(2)
print([result['name'] for result in results])Copy the code
The running results are as follows:
['Kevin'.'Mark'.'Mike']Copy the code
In addition, you can use the limit() method to specify the number of results to fetch, as shown in the following example:
results = collection.find().sort('name', pymongo.ASCENDING).skip(2).limit(2)
print([result['name'] for result in results])Copy the code
The running results are as follows:
['Kevin'.'Mark']Copy the code
If you do not use the limit() method, it would have returned three results, but if you have added the limit, it will intercept two results.
It is important to note that when the number of databases is very large, such as tens or hundreds of millions, it is best not to use large offsets to query data, as this may cause memory overflow. In this case, you can perform operations similar to the following:
from bson.objectid import ObjectId
collection.find({'_id': {'$gt': ObjectId('593278c815c2602678bb2b8d')}})Copy the code
In this case, you need to record the _id queried last time.
Update 10.
For data updates, we can use the update() method, specifying the condition of the update and the updated data. Such as:
condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 25
result = collection.update(condition, student)
print(result)Copy the code
In this case, we want to update the age of the data named Kevin: first specify the query condition, then query the data out, change the age and call update() to pass in the old condition and the changed data.
The running results are as follows:
{'ok': 1, 'nModified': 1, 'n': 1, 'updatedExisting': True}Copy the code
The result is returned in dictionary form, with OK for successful execution and nModified for the number of items affected.
Alternatively, we can update the data using the $set operator as follows:
result = collection.update(condition, {'$set': student})Copy the code
This allows you to update only the fields that exist in the Student dictionary. If there are other fields, they are not updated, and they are not deleted. If $set is not used, then the student dictionary replaces all previous data; If there are other fields, they will be deleted.
In addition, the update() method is officially not recommended. There are also update_one() and update_many() methods, which are more strictly used. Their second argument requires the $operator as the dictionary key name, as shown in the following example:
condition = {'name': 'Kevin'}
student = collection.find_one(condition)
student['age'] = 26
result = collection.update_one(condition, {'$set': student})
print(result)
print(result.matched_count, result.modified_count)Copy the code
The update_one() method is called, and instead of passing in the modified dictionary directly, the second argument needs to be of the form {‘$set’: student}, which returns type UpdateResult. The matched_count and modified_count properties are then called to get the number of matched and affected data bars, respectively.
The running results are as follows:
<pymongo.results.UpdateResult object at 0x10d17b678>
1 0Copy the code
Let’s look at another example:
condition = {'age': {'$gt': 20}}
result = collection.update_one(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)Copy the code
{‘$inc’: {‘age’: 1}} {‘$inc’: {‘age’: 1}} {‘$inc’: {‘age’: 1}} {‘$inc’: {‘age’: 1}} {‘$inc’: {‘age’: 1}} {‘$inc’: {‘age’: 1}}
The running results are as follows:
<pymongo.results.UpdateResult object at 0x10b8874c8>
1 1Copy the code
You can see that the number of matches is 1, and the number of influences is also 1.
If the update_many() method is called, all eligible data will be updated, as shown in the following example:
condition = {'age': {'$gt': 20}}
result = collection.update_many(condition, {'$inc': {'age': 1}})
print(result)
print(result.matched_count, result.modified_count)Copy the code
Then the number of matched items is no longer 1, and the result is as follows:
<pymongo.results.UpdateResult object at 0x10c6384c8>
3 3Copy the code
As you can see, all matched data is then updated.
11. Remove
The deletion operation is simple. Simply call the remove() method to specify the deletion conditions. In this case, all the data meeting the conditions will be deleted. The following is an example:
result = collection.remove({'name': 'Kevin'})
print(result)Copy the code
The running results are as follows:
{'ok': 1, 'n': 1}Copy the code
In addition, there are two new recommended methods — delete_one() and delete_many(). The following is an example:
result = collection.delete_one({'name': 'Kevin'})
print(result)
print(result.deleted_count)
result = collection.delete_many({'age': {'$lt': 25}})
print(result.deleted_count)Copy the code
The running results are as follows:
<pymongo.results.DeleteResult object at 0x10e6ba4c8>
1
4Copy the code
Delete_one () deletes the first data that meets the condition, and delete_many() deletes all the data that meets the condition. They all return type DeleteResult, and you can call the deleted_count attribute to get the number of deleted items.
12. Other operations
In addition, PyMongo provides some combined methods, such as find_one_and_delete(), find_one_and_replace(), and find_one_and_update(), which are delete, replace, and update operations after lookup, and use them in much the same way.
Indexes can also be operated on with methods such as create_index(), create_INDEXES (), and drop_index().
The detailed usage of the PyMongo, can see the official document: http://api.mongodb.com/python/current/api/pymongo/collection.html.
In addition, there are some operation of database and the collection itself, here no longer one by one, you can see the official document: http://api.mongodb.com/python/current/api/pymongo/.
This section explains how to add, delete, modify and check data using PyMongo to MongoDB.
This resource starting in Cui Qingcai personal blog still find: Python3 tutorial | static find web crawler development practical experience
For more information about crawlers, please follow my wechat official account: Coder
Weixin.qq.com/r/5zsjOyvEZ… (Qr code automatic identification)