If I am engaged in data mining, data processing, data analysis and other work dealing with data in Python, I will inevitably come into contact with MySQL and SqlServer database. The following is a simple encapsulation method commonly used to connect MySQL and SqlServer in Python. Most of the work I do is to get the data down for data mining processing analysis, and then write the processed data, so I only encapsulate the query and write.


The terminal directly executes the PIP command:

pip install pymysql
pip install pymssql
pip install pandas
Copy the code

So let’s go straight to the code

# -* -coding: utF-8 -* -import pymssQL import Pymysql def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): self.host = host self.port = port self.user = user self.pwd = pwd self.db = db self.conn = conn def get_cursor(self): raise NotImplementedError def query(self, sql): with self.get_cursor() as cur: Cur.execute (SQL) # select * from cur.execute(SQL) where cur.execute(SQL) # select * from cur.execute(SQL) Description description_field = [] for I in description_field_info: Return description_field, cur.fetchAll () def query_one(self, SQL): with self.get_cursor() as cur: cur.execute(sql) return cur.fetchone() def commit_sql(self, sql): with self.get_cursor() as cur: cur.execute(sql) self.conn.commit() def commit_many_sql(self, sql, param): with self.get_cursor() as cur: cur.executemany(sql, param) self.conn.commit() class MssqlDataBase(DataBase): def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): super(MssqlDataBase, self).__init__(host=host, port=port, user=user, pwd=pwd, db=db, conn=conn) self.cursor = None def get_cursor(self): if self.conn: self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to database ") return self.cursor else: self.conn = pymssql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, charset="utf8") self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to DataBase ") return self.cursor class MysqlDataBase(DataBase): def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): super(MysqlDataBase, self).__init__(host=host, port=port, user=user, pwd=pwd, db=db, conn=conn) self.cursor = None def connect(self): self.conn = pymysql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, Charset ="utf8") def get_cursor(self, is_scursor =None): if self.conn: Self.cursor = pymysql. Cursors.SSCursor(self.conn) else: self.cursor = self.conn.cursor() if not self.cursor: self.cursor () Raise (NameError, "failed to connect to database ") return self.cursor else: self.conn = pymysql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, charset="utf8") if is_sscursor: self.cursor = self.conn.cursor(cursor=pymysql.cursors.SSCursor) else: self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to database ") return self. Cursor def get_sscursor(self, SQL): cur = self.get_cursor(is_sscursor=True) cur.execute(sql) return cur def ping(self, reconnect=True): if self.conn is None: self.connect() self.conn.ping(reconnect=reconnect) self.cursor = NoneCopy the code
Def get_mysql_world(): return MysqlDataBase(host= ", port=3306, user= ", PWD = ", db= "world")Copy the code
if __name__ == '__main__': import pandas res_field, Res = get_mysql_world().query('select * from city limit 1' pd.DataFrame(res,columns=res_field)Copy the code

Pandas also has an API for reading and writing to MySQL. However, when there is a large amount of data, it is very time-consuming for pandas to read and write to MySQL. The speed of reading and writing to MySQL may be 10 times faster than that of pandas.

Current cursor

When mysql has a large amount of data, you can use a streaming cursor

When Python reads tens of millions of databases from mysql using Pymysql operations

If you use the traditional fetchall() or fetchone() methods, the default is to cache all rows in memory and then process them. A large amount of data will cause memory resources to run out

In this case, it is recommended to use SSCursor(streaming cursor) to avoid the client occupying a large amount of memory

This cursor doesn’t actually cache any data, it doesn’t read all of it into memory, what it does is it reads the records from the storage block and returns them to you one by one, using iterators instead of fetchAll, which saves memory and gets the data back quickly

reference

Blog.csdn.net/weixin_4205…

Hope to be helpful to everybody, the place that has a problem also asks everybody to criticize and point out, thank!!

It would be nice to get some attention