If I am engaged in data mining, data processing, data analysis and other work dealing with data in Python, I will inevitably come into contact with MySQL and SqlServer database. The following is a simple encapsulation method commonly used to connect MySQL and SqlServer in Python. Most of the work I do is to get the data down for data mining processing analysis, and then write the processed data, so I only encapsulate the query and write.
The terminal directly executes the PIP command:
pip install pymysql
pip install pymssql
pip install pandas
Copy the code
So let’s go straight to the code
# -* -coding: utF-8 -* -import pymssQL import Pymysql def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): self.host = host self.port = port self.user = user self.pwd = pwd self.db = db self.conn = conn def get_cursor(self): raise NotImplementedError def query(self, sql): with self.get_cursor() as cur: Cur.execute (SQL) # select * from cur.execute(SQL) where cur.execute(SQL) # select * from cur.execute(SQL) Description description_field = [] for I in description_field_info: Return description_field, cur.fetchAll () def query_one(self, SQL): with self.get_cursor() as cur: cur.execute(sql) return cur.fetchone() def commit_sql(self, sql): with self.get_cursor() as cur: cur.execute(sql) self.conn.commit() def commit_many_sql(self, sql, param): with self.get_cursor() as cur: cur.executemany(sql, param) self.conn.commit() class MssqlDataBase(DataBase): def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): super(MssqlDataBase, self).__init__(host=host, port=port, user=user, pwd=pwd, db=db, conn=conn) self.cursor = None def get_cursor(self): if self.conn: self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to database ") return self.cursor else: self.conn = pymssql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, charset="utf8") self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to DataBase ") return self.cursor class MysqlDataBase(DataBase): def __init__(self, host=None, port=None, user=None, pwd=None, db=None, conn=None): super(MysqlDataBase, self).__init__(host=host, port=port, user=user, pwd=pwd, db=db, conn=conn) self.cursor = None def connect(self): self.conn = pymysql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, Charset ="utf8") def get_cursor(self, is_scursor =None): if self.conn: Self.cursor = pymysql. Cursors.SSCursor(self.conn) else: self.cursor = self.conn.cursor() if not self.cursor: self.cursor () Raise (NameError, "failed to connect to database ") return self.cursor else: self.conn = pymysql.connect(host=self.host, port=self.port, user=self.user, password=self.pwd, database=self.db, charset="utf8") if is_sscursor: self.cursor = self.conn.cursor(cursor=pymysql.cursors.SSCursor) else: self.cursor = self.conn.cursor() if not self.cursor: Raise (NameError, "failed to connect to database ") return self. Cursor def get_sscursor(self, SQL): cur = self.get_cursor(is_sscursor=True) cur.execute(sql) return cur def ping(self, reconnect=True): if self.conn is None: self.connect() self.conn.ping(reconnect=reconnect) self.cursor = NoneCopy the code
Def get_mysql_world(): return MysqlDataBase(host= ", port=3306, user= ", PWD = ", db= "world")Copy the code
if __name__ == '__main__': import pandas res_field, Res = get_mysql_world().query('select * from city limit 1' pd.DataFrame(res,columns=res_field)Copy the code
Pandas also has an API for reading and writing to MySQL. However, when there is a large amount of data, it is very time-consuming for pandas to read and write to MySQL. The speed of reading and writing to MySQL may be 10 times faster than that of pandas.
Current cursor
When mysql has a large amount of data, you can use a streaming cursor
When Python reads tens of millions of databases from mysql using Pymysql operations
If you use the traditional fetchall() or fetchone() methods, the default is to cache all rows in memory and then process them. A large amount of data will cause memory resources to run out
In this case, it is recommended to use SSCursor(streaming cursor) to avoid the client occupying a large amount of memory
This cursor doesn’t actually cache any data, it doesn’t read all of it into memory, what it does is it reads the records from the storage block and returns them to you one by one, using iterators instead of fetchAll, which saves memory and gets the data back quickly
reference
Blog.csdn.net/weixin_4205…