In Python 2, most libraries that connect to MySQL use MySQLdb, but this library does not officially support Python 3, so PyMySQL is recommended.

In this section, we will explain how to use PyMySQL to operate MySQL database.

1. Preparation

Before you start, make sure you have your MySQL database installed and running properly, and you need to have the PyMySQL library installed.

2. Connect to the database

Here, try connecting to the database first. Assume that the user name, password, and port of the MySQL server are root, 123456, and 3306 respectively. We use PyMySQL to connect to MySQL and create a new database called spiders.

import pymysql

db = pymysql.connect(host='localhost',user='root', password='123456', port=3306)
cursor = db.cursor()
cursor.execute('SELECT VERSION()')
data = cursor.fetchone()
print('Database version:', data)
cursor.execute("CREATE DATABASE spiders DEFAULT CHARACTER SET utf8")
db.close()Copy the code

The running results are as follows:

Database version: ('5.6.22'.)Copy the code

Here we declare a MySQL connection object db via PyMySQL’s connect() method, passing in the host (IP) for MySQL to run. Since MySQL is running locally, localhost is passed in. If MySQL is running remotely, pass in its public IP address. The following parameters are user, password, and port (3306 by default).

After the connection is successful, you need to call the cursor() method to obtain the MySQL operation cursor, and use the cursor to execute the SQL statement. Here we execute two SQL statements, using the execute() method. The first SQL statement is used to get the current version of MySQL, and the fetchone() method is called to get the first data, which is the version number. The second SQL statement performs the operation of creating a database called spiders, with the default encoding UTF-8. Since this statement is not a query, the spiders of the database are successfully created when executed directly. This database is then used for subsequent operations.

3. Create a table

Typically, creating a database only needs to be done once. Of course, we can also create the database manually. From now on, all our operations are performed on the spiders database.

After the database is created, you need to specify an additional parameter db when you connect.

Next, create a new table, STUDENTS, and execute the SQL statement to create the table. Three fields are specified, and the structure is shown below.

The field name meaning type
id Student id varchar
name The name varchar
age age int

Example code for creating this table is as follows:

import pymysql

db = pymysql.connect(host='localhost', user='root', password='123456', port=3306, db='spiders')
cursor = db.cursor()
sql = 'CREATE TABLE IF NOT EXISTS students (id VARCHAR(255) NOT NULL, name VARCHAR(255) NOT NULL, age INT NOT NULL, PRIMARY KEY (id))'
cursor.execute(sql)
db.close()Copy the code

After running, we created a data table named STUDENTS.

Of course, only the simplest fields are specified here for demonstration purposes. In fact, during the crawler process, we design specific fields based on the crawler results.

4. Insert data

The next step is to insert data into the database. For example, a student id is 20120001, name is Bob, age is 20, so how to insert this data into the database? Example code is as follows:

import pymysql

id = '20120001'
user = 'Bob'
age = 20

db = pymysql.connect(host='localhost', user='root', password='123456', port=3306, db='spiders')
cursor = db.cursor()
sql = 'INSERT INTO students(id, name, age) values(%s, %s, %s)'
try:
    cursor.execute(sql, (id, user, age))
    db.commit()
except:
    db.rollback()
db.close()Copy the code

Here we construct an SQL statement whose Value is not a string concatenation, as in:

sql = 'INSERT INTO students(id, name, age) values(' + id + ', ' + name + ', ' + age + ') 'Copy the code

This is cumbersome and unintuitive, so we choose to use the %s formatter directly. We pass in the first argument to the execute() method, and the Value is passed in a uniform tuple. This avoids both string concatenation and conflicting quotes.

Note later that data insertion requires the COMMIT () method of the DB object, which actually commits the statement to the database for execution. You need to call this method to insert, update, and delete data.

Next, we added a layer of exception handling. If it fails, rollback() is called to rollback the data as if nothing had happened.

There is the issue of transactions. The transaction mechanism ensures data consistency, that is, the event either happened or did not happen. If you insert a piece of data, you don’t insert half of it, you insert all of it, or you insert none of it, that’s the atomicity of a transaction. In addition, transactions have three attributes — consistency, isolation, and persistence. These four properties are commonly referred to as ACID properties, as described in the following table.

attribute explain
Atomicity A transaction is an indivisible unit of work in which all or none of the operations involved are performed
Consistency Transactions must change the database from one consistent state to another. Consistency is closely related to atomicity
Isolation The execution of a transaction cannot be interfered by other transactions, that is, the operations and data used within a transaction are isolated from other concurrent transactions, and the concurrent transactions cannot interfere with each other
“Durability” Permanence, also known as permanence, means that once a transaction is committed, its changes to data in the database should be permanent. Subsequent operations or failures should not affect it in any way

Insert, update, and delete operations are all operations that make changes to the database, and the change operation must be a transaction, so the standard way to write these operations is:

try:
    cursor.execute(sql)
    db.commit()
except:
    db.rollback()Copy the code

This ensures data consistency. The COMMIT () and rollback() methods here provide support for transaction implementation.

The above insert operation is implemented by constructing the SQL statement, but it is obvious that there is an extremely inconvenient place, such as the sudden addition of the gender field, where the SQL statement needs to be changed to:

INSERT INTO students(id, name, age, gender) values(%s, %s, %s, %s)Copy the code

The corresponding tuple parameter needs to be changed to:

(id, name, age, gender)Copy the code

That’s obviously not what we want. In many cases, the goal is to leave the insertion method unchanged, to make it a generic method, and just pass in a dynamically changing dictionary. For example, construct a dictionary like this:

{
    'id': '20120001'.'name': 'Bob'.'age'20} :Copy the code

The SQL statement is then dynamically constructed from the dictionary, and the tuple is also dynamically constructed so that the generic insert method can be implemented. So, here we need to rewrite the insert method:

data = {
    'id': '20120001'.'name': 'Bob'.'age': 20
}
table = 'students'
keys = ', '.join(data.keys())
values = ', '.join(['%s'] * len(data))
sql = 'INSERT INTO {table}({keys}) VALUES ({values})'.format(table=table, keys=keys, values=values)
try:
   if cursor.execute(sql, tuple(data.values())):
       print('Successful')
       db.commit()
except:
    print('Failed')
    db.rollback()
db.close()Copy the code

Here we pass in a dictionary and define it as a data variable. The table name is also defined as the variable table. Next, you need to construct a dynamic SQL statement.

First, you need to construct the inserted fields ID, name, and age. Just take the key name of data and separate it with a comma. So ‘, ‘. Join (data.keys()) results in id, name, age, and then construct more than %s as placeholders. For example, if there are three fields, construct %s, %s, %s. Here, first of all, defines the length of 1 array (‘ % s’), then use the multiplication to expand its for [‘ % s’, ‘% s’,’ % s’], then call the join () method, which eventually become % s, % s, % s. Finally, we use the format() method of strings to construct table names, field names, and placeholders. The final SQL statement is dynamically constructed as:

INSERT INTO students(id, name, age) VALUES (%s, %s, %s)Copy the code

Finally, the first argument to the execute() method is passed in an SQL variable and the second argument is passed in a tuple constructed from the key value of data, and the data is successfully inserted.

This way, we can pass in a dictionary to insert data without having to modify the SQL statement or insert operation.

5. Update data

Data update operations actually execute SQL statements. The simplest way is to construct an SQL statement and execute:

sql = 'UPDATE students SET age = %s WHERE name = %s'
try:
   cursor.execute(sql, (25, 'Bob'))
   db.commit()
except:
   db.rollback()
db.close()Copy the code

Here again, the SQL is constructed as placeholders, and the execute() method is executed, passing in tuples of arguments, and the commit() method is executed as well. If you want to do simple data updates, you can use this method.

However, in the actual data fetching process, most of the cases need to insert data, but we are concerned about whether there will be duplicate data, if there is, we want to update the data rather than save it again. In addition, as mentioned earlier in the problem of dynamically constructing SQL, it is possible to implement another method of de-duplication, updating the data if it exists; If the data does not exist, it is inserted. In addition, this approach supports flexible dictionary pass-throughs. The following is an example:

data = {
    'id': '20120001'.'name': 'Bob'.'age': 21
}

table = 'students'
keys = ', '.join(data.keys())
values = ', '.join(['%s'] * len(data))

sql = 'INSERT INTO {table}({keys}) VALUES ({values}) ON DUPLICATE KEY UPDATE'.format(table=table, keys=keys, values=values)
update = ', '.join([" {key} = %s".format(key=key) for key in data])
sql += update
try:
    if cursor.execute(sql, tuple(data.values())*2):
        print('Successful')
        db.commit()
except:
    print('Failed')
    db.rollback()
db.close()Copy the code

The SQL statement is an insert statement, but we add ON DUPLICATE KEY UPDATE. This line of code means that the update operation is performed if the primary key already exists. For example, the data ID we passed in is still 20120001, but the age has changed from 20 to 21. At this time, this data will not be inserted, but the data with THE ID of 20120001 will be directly updated. The complete SQL construct looks like this:

INSERT INTO students(id, name, age) VALUES (%s, %s, %s) ON DUPLICATE KEY UPDATE id = %s, name = %s, age = %sCopy the code

So this becomes 6 percent s. So the second parameter tuple of the execute() method later needs to be multiplied by 2 to double its original value.

In this way, we can insert data if the primary key does not exist, and update data if it does exist.

6. Delete data

The DELETE operation is relatively simple, using the DELETE statement, specifying the name of the target table to be dropped and the DELETE condition, and still requires db’s commit() method to take effect. The following is an example:

table = 'students'
condition = 'age > 20'

sql = 'DELETE FROM {table} WHERE {condition}'.format(table=table, condition=condition)
try:
    cursor.execute(sql)
    db.commit()
except:
    db.rollback()

db.close()Copy the code

Because there are a variety of delete conditions, operators are greater than, less than, equal to, LIKE, etc., AND conditional connectors are AND, OR, etc., so we will not continue to construct complex judgment conditions. The condition is passed directly as a string for deletion.

7. Query data

After inserting, modifying, and deleting, there is one very important operation left, which is query. The query uses the SELECT statement as shown in the following example:

sql = 'SELECT * FROM students WHERE age >= 20'

try:
    cursor.execute(sql)
    print('Count:', cursor.rowcount)
    one = cursor.fetchone()
    print('One:', one)
    results = cursor.fetchall()
    print('Results:', results)
    print('Results Type:'.type(results))
    for row in results:
        print(row)
except:
    print('Error')Copy the code

The running results are as follows:

Count: 4
One: ('20120001'.'Bob', 25)
Results: (('20120011'.'Mary', 21), ('20120012'.'Mike', 20),'20120013'.'James', 22))
Results Type: <class 'tuple'>
('20120011'.'Mary'(21),'20120012'.'Mike', (20)'20120013'.'James', 22)Copy the code

Here we construct an SQL statement that queries students aged 20 and older and passes it to the execute() method. Note that db’s commit() method is no longer required. Next, call the Rowcount property of the cursor to get the number of query results, which is 4 in the current example.

We then call the fetchone() method, which fetches the first value of the result and returns the result as a tuple. The order of the elements in the tuple corresponds to the order of the fields, i.e. the first element is the first field ID, the second element is the second field name, and so on. We then call the fetchAll () method, which gets all the data for the result. We then print out the result and type, which is a double tuple, where each element is a record, and we iterate it out.

But there’s one thing to notice here. Instead of showing four data pieces, doesn’t the fetchall() method fetchall the data? This is because its internal implementation has an offset pointer that points to the query result. Initially, the offset pointer points to the first piece of data, and after fetching it once, the pointer moves to the next piece of data, so that the next piece of data is fetched. We initially called the fetchone() method once, so that the offset pointer to the result points to the next data, and fetchall() returns all the data that the offset pointer points to up to the end, so this method fetches only three results.

Alternatively, instead of fetchall() fetching all the data at once, we can use a while loop to fetchall the data. Fetchall () returns all the results as tuples, which can be very expensive if the data is large. Therefore, it is recommended to use the following method to fetch data item by item:

sql = 'SELECT * FROM students WHERE age >= 20'
try:
    cursor.execute(sql)
    print('Count:', cursor.rowcount)
    row = cursor.fetchone()
    while row:
        print('Row:', row)
        row = cursor.fetchone()
except:
    print('Error')Copy the code

In this way, the pointer will offset one piece of data each time through the loop.

In this section, we explained how to use PyMySQL to manipulate the MySQL database and the construction of some SQL statements.

This resource starting in Cui Qingcai personal blog still find: Python3 tutorial | static find web crawler development practical experience

For more crawler information, please follow my personal wechat official account: Attack Coder

Weixin.qq.com/r/5zsjOyvEZ… (Qr code automatic recognition)