1. The scene
A few days ago, a friend left me a message in the background, saying that he had dozens of wechat groups, and his energy was limited, so he could not see them. He wanted to screen some high-quality groups, and asked me if I could help to think about ways.
In fact, all the chat records in the wechat group are in the local folder of the phone, just need to export for decryption, and then a wave of data analysis, can help him screen out high-quality community.
This article will walk you through this functionality step by step in Python.
2. Implementation steps
Step 1: Export the wechat chat record database
First of all, we use a Root mobile phone or emulator to log in to wechat, find the wechat chat record database, and then export it to the local.
The full path to the database file is as follows:
# wechat chat record database complete path/ data/data/com. Tencent. Mm/MicroMsg / [current login WeChat random string] / EnMicroMsg dbCopy the code
Note that if the current device does not have Root, you can select group chat messages for a migration and then export the database from the Root device or emulator.
Step 2, get the password for the database
Wechat database password composition form: mobile phone IMEI + wechat UIN, and then MD5 encryption (32-bit lowercase) the first 7 digits.
Among them, the IMEI of the mobile phone can be obtained through *#06#. If it is a dual-sim phone, you need to make a judgment by yourself.
In the following configuration file, find the value whose name attribute is default_uin, that is, UIN
# Current login wechat profile
/data/data/com.tencent.mm/shared_prefs/system_config_prefs.xmlCopy the code
Finally, IMET and UIN are formed into strings and encrypted with MD5. The first 7 bits of lowercase 32-bit are the password of wechat database.
Step 3, crack the database
Since the wechat database is generated using SQLCipher, the SQLCipher command line file must be installed first
Install sqlCipher command line (Mac)
brew install sqlcipher
# Win can download sqlCipher command line fileCopy the code
Then, input the database password and decryption mode, export cracked database.
Step 4, analyze the database
It is recommended to open and analyze the cracked database using SQLiteSutdio, focusing on the three tables message, rContact, and Chatroom.
All text chat records of wechat are stored in mesage data table, including chat content, sender, message type, creation time and so on
Rcontact is the wechat address book table, including wechat ID, nickname, and remarks name
The chatroom is a group chat information table, including the group chat ID and member list
Step 5, Python opens the database and wraps it
Connect to the local database file using SQlite3 to get the database object and cursor object
import sqlite3def __init__(self, db_path="./weixin.db") :"""Local database initialization""" self.db = sqlite3.connect(db_path) self.cursor = self.db.cursor()Copy the code
Then, the database commonly used operations, including: add, delete, change, check, encapsulation operation.
def execute(self, sql, param=None): """SQL: SQL statement, including: add, delete, modify param: data, can be a list, dictionary, or empty""" try: if param is None: self.cursor.execute(sql) else: if type(param) is list: self.cursor.executemany(sql, param) else: self.cursor.execute(sql, param) count = self.db.total_changes self.db.commit() except Exception as e: print(e) return False, e Return True if count > 0 else Falsedef query(self, SQL, param=None): """ query statement SQL: SQL statement param: parameter, can contain empty retuTN: success returns True """ if param is None: self.cursor.execute(SQL) else: Fetchall () self.cursor.execute(SQL, param) return self.cursor.fetchall()Copy the code
Step 6: Get the group CHAT ID by the group chat name
According to the group chat nickname, use Sql statement query rContact table, you can obtain the group chat ID value
def __get_chartroom_id(self): """Get group chat ID :return:""" res = self.db.query('select username from rcontact where nickname=? ; ', (self.chatroom_name,)) # chat id chatRoom_id = res[0][0] return chatroom_idCopy the code
Step 7: Get group chat messages
Once you have the group chat ID, you then query the Message table to get all the messages in the current group chat.
# isSend=0: sent by the other party; IsSend = 1: SQL = "SELECT content FROM message WHERE talker='{}' and isSend=0". Format (chatroom_id) Result = self.db.query(SQL)Copy the code
In order to obtain effective message content, you can clean the message sent by yourself, system message, red envelope message and other content
If not item or not item[0] or item[0]. Find (' XML ')! = -1 or item[0].find('sysmsg') ! = -1 or item[0].find( '
') ! = -1 or item[0].find('chatroom') ! = -1 or item[0].find('weixinhongbao') ! Split (':') if len(temps) < 2: split(':') if len(temps) < 2: # print(' sender :' + item[0]) continue # print(' sender :') Only keep the message body # sender send_from = item [0]. The split (' : ') [0] # send contents send_msg = "". Join (item [0]. The split (' : ') [1]). The strip (.) replace (" \" ", If len(send_msg) > 200: continue
Copy the code
For messages sent by other members of the group, only the message body is retained after the first half of the message content is filtered out
Step 8: Generate word clouds
Jieba is used to segment valid messages in the group, and wordcloud is used to generate wordcloud.
def generate_wordcloud(self, word): """Generate word cloud :param word: :return:""" img = WordCloud(font_path="./DroidSansFallbackFull.ttf", width=2000, height=2000, margin=2, collocations=False).generate(word) plt.imshow(img) plt.axis("off") plt.show() # save picture img. The to_file (" {}. PNG ". The format (" group ")) # participle temp = "". Join (jieba. The cut (words, Cut_all =True) # generate_wordcloud(temp)Copy the code
Step 9, create a ranking table and insert data
In order to count the rank of group chat activity, we need to create a new table, which contains three fields: ID, wechat nickname and message content.
def __create_top_table(self): ""Create table Top :return:"" Create table Top; Result = self.db.execute("CREATE TABLE IF NOT EXISTS top(uid INTEGER primary key,name vARCHar (200), MSG) varchar(200))")Copy the code
Next, insert the sender ID and sent content fields of each message in the previous step into the newly created Top table
Msg_pre = []for item in result: # the sender send_from = item [0]. The split (' : ') [0] # send contents send_msg = "". Join (item [0]. The split (' : ') [1]). The strip (.) replace (" \" ", Msg_pre.append ((send_from, send_msg)) # Self.db. execute("insert into top(uid,name, MSG) values (NULL,? ,?) ;" , msg_pre)Copy the code
Step 10, get the activity ranking and visualize it
From the Top data table, the number of speeches of each member was queried by wechat nickname and saved in a list
def get_top_partner(self): """Top 15 members: Return:""" sql = "SELECT name as name,COUNT(*) as times FROM top GROUP BY name ORDER BY times DESC LIMIT %d; % self.top_num result = self.db.query(sql) for item in result: # id = item[0] # id = item[1] # id = item[0] # id = item[1] Append ({'username': username, 'count': count}) self.get_username(id) self.top_data.append({'username': username, 'count': count})Copy the code
Finally, remove the special symbols of wechat nicknames and use Pyecharts to visualize the data.
def draw_image(self): """Data Visualization :return:""" usernames = [] counts = [] for user in self.top_data: Usernames.append (get_avA_string (user.get('username').strip())[0:8]) counts. Append (user.get('count')) def bar_chart() -> Bar: C = (Bar().add_xaxis(usernames).add_yaxis(" active ", counts) .reversal_axis() .set_series_opts(label_opts=opts.LabelOpts(position="right")) .set_global_opts(title_opts= opts.titleopts (title=" most active %d friends "% self.top_num))) return c # Snapshot-Selenium or snapshot-phantomjs make_snapshot(driver, bar_chart().render(), "bar.png")Copy the code
3. The last
In the above operation, the topic and value of the current group chat in the past period of time can be learned from the generated word cloud, and the activity ranking of wechat group chat can be obtained through the data analysis of chat records.
Of course, diving ranking of group members and data analysis of a group member can also be analyzed.
I have uploaded all the source code to the background, follow the public account “AirPython” reply “wechat group chat” can get all the source code.
If you think the article is good, please like it and share it. Your affirmation is my biggest encouragement and support.