1. The scene

A few days ago, a friend left me a message in the background, saying that he had dozens of wechat groups, and his energy was limited, so he could not see them. He wanted to screen some high-quality groups, and asked me if I could help to think about ways.

In fact, all the chat records in the wechat group are in the local folder of the phone, just need to export for decryption, and then a wave of data analysis, can help him screen out high-quality community.

This article will walk you through this functionality step by step in Python.

2. Implementation steps

Step 1: Export the wechat chat record database

First of all, we use a Root mobile phone or emulator to log in to wechat, find the wechat chat record database, and then export it to the local.

The full path to the database file is as follows:

# wechat chat record database complete path/ data/data/com. Tencent. Mm/MicroMsg / [current login WeChat random string] / EnMicroMsg dbCopy the code

Note that if the current device does not have Root, you can select group chat messages for a migration and then export the database from the Root device or emulator.

Step 2, get the password for the database

Wechat database password composition form: mobile phone IMEI + wechat UIN, and then MD5 encryption (32-bit lowercase) the first 7 digits.

Among them, the IMEI of the mobile phone can be obtained through *#06#. If it is a dual-sim phone, you need to make a judgment by yourself.

In the following configuration file, find the value whose name attribute is default_uin, that is, UIN

# Current login wechat profile
/data/data/com.tencent.mm/shared_prefs/system_config_prefs.xmlCopy the code

Finally, IMET and UIN are formed into strings and encrypted with MD5. The first 7 bits of lowercase 32-bit are the password of wechat database.

Step 3, crack the database

Since the wechat database is generated using SQLCipher, the SQLCipher command line file must be installed first

Install sqlCipher command line (Mac)
brew install sqlcipher

# Win can download sqlCipher command line fileCopy the code

Then, input the database password and decryption mode, export cracked database.

Step 4, analyze the database

It is recommended to open and analyze the cracked database using SQLiteSutdio, focusing on the three tables message, rContact, and Chatroom.

All text chat records of wechat are stored in mesage data table, including chat content, sender, message type, creation time and so on

Rcontact is the wechat address book table, including wechat ID, nickname, and remarks name

The chatroom is a group chat information table, including the group chat ID and member list

Step 5, Python opens the database and wraps it

Connect to the local database file using SQlite3 to get the database object and cursor object

import sqlite3def __init__(self, db_path="./weixin.db") :"""Local database initialization"""    self.db = sqlite3.connect(db_path)    self.cursor = self.db.cursor()Copy the code

Then, the database commonly used operations, including: add, delete, change, check, encapsulation operation.

def execute(self, sql, param=None):    """SQL: SQL statement, including: add, delete, modify param: data, can be a list, dictionary, or empty"""    try:        if param is None:            self.cursor.execute(sql)        else:            if type(param) is list:                self.cursor.executemany(sql, param)            else:                self.cursor.execute(sql, param)            count = self.db.total_changes            self.db.commit()    except Exception as e:        print(e)        return False, e    Return True if count > 0 else Falsedef query(self, SQL, param=None): """ query statement SQL: SQL statement param: parameter, can contain empty retuTN: success returns True """ if param is None: self.cursor.execute(SQL) else: Fetchall () self.cursor.execute(SQL, param) return self.cursor.fetchall()Copy the code

Step 6: Get the group CHAT ID by the group chat name

According to the group chat nickname, use Sql statement query rContact table, you can obtain the group chat ID value

def __get_chartroom_id(self):    """Get group chat ID :return:"""    res = self.db.query('select username from rcontact where nickname=? ; ', (self.chatroom_name,))    # chat id chatRoom_id = res[0][0] return chatroom_idCopy the code

Step 7: Get group chat messages

Once you have the group chat ID, you then query the Message table to get all the messages in the current group chat.

# isSend=0: sent by the other party; IsSend = 1: SQL = "SELECT content FROM message WHERE talker='{}' and isSend=0". Format (chatroom_id) Result = self.db.query(SQL)Copy the code

In order to obtain effective message content, you can clean the message sent by yourself, system message, red envelope message and other content

If not item or not item[0] or item[0]. Find (' XML ')! = -1 or item[0].find('sysmsg') ! = -1 or item[0].find( '
       
        ') ! = -1 or item[0].find('chatroom') ! = -1 or item[0].find('weixinhongbao') ! Split (':') if len(temps) < 2: split(':') if len(temps) < 2: # print(' sender :' + item[0]) continue # print(' sender :') Only keep the message body # sender send_from = item [0]. The split (' : ') [0] # send contents send_msg = "". Join (item [0]. The split (' : ') [1]). The strip (.) replace (" \" ", If len(send_msg) > 200: continue
       Copy the code

For messages sent by other members of the group, only the message body is retained after the first half of the message content is filtered out

Step 8: Generate word clouds

Jieba is used to segment valid messages in the group, and wordcloud is used to generate wordcloud.

def generate_wordcloud(self, word):    """Generate word cloud :param word: :return:"""    img = WordCloud(font_path="./DroidSansFallbackFull.ttf", width=2000, height=2000,                        margin=2, collocations=False).generate(word)    plt.imshow(img)    plt.axis("off")    plt.show()    # save picture img. The to_file (" {}. PNG ". The format (" group ")) # participle temp = "". Join (jieba. The cut (words, Cut_all =True) # generate_wordcloud(temp)Copy the code

Step 9, create a ranking table and insert data

In order to count the rank of group chat activity, we need to create a new table, which contains three fields: ID, wechat nickname and message content.

def __create_top_table(self):   ""Create table Top :return:""   Create table Top; Result = self.db.execute("CREATE TABLE IF NOT EXISTS top(uid INTEGER primary key,name vARCHar (200), MSG) varchar(200))")Copy the code

Next, insert the sender ID and sent content fields of each message in the previous step into the newly created Top table

Msg_pre = []for item in result: # the sender send_from = item [0]. The split (' : ') [0] # send contents send_msg = "". Join (item [0]. The split (' : ') [1]). The strip (.) replace (" \" ", Msg_pre.append ((send_from, send_msg)) # Self.db. execute("insert into top(uid,name, MSG) values (NULL,? ,?) ;" , msg_pre)Copy the code

Step 10, get the activity ranking and visualize it

From the Top data table, the number of speeches of each member was queried by wechat nickname and saved in a list

def get_top_partner(self):    """Top 15 members: Return:"""    sql = "SELECT name as name,COUNT(*) as times FROM top GROUP BY name ORDER BY times DESC LIMIT %d; % self.top_num    result = self.db.query(sql)    for item in result:        # id = item[0] # id = item[1] # id = item[0] # id = item[1] Append ({'username': username, 'count': count}) self.get_username(id) self.top_data.append({'username': username, 'count': count})Copy the code

Finally, remove the special symbols of wechat nicknames and use Pyecharts to visualize the data.

def draw_image(self):    """Data Visualization :return:"""    usernames = []    counts = []    for user in self.top_data:         Usernames.append (get_avA_string (user.get('username').strip())[0:8]) counts. Append (user.get('count')) def bar_chart() -> Bar: C = (Bar().add_xaxis(usernames).add_yaxis(" active ", counts) .reversal_axis() .set_series_opts(label_opts=opts.LabelOpts(position="right")) .set_global_opts(title_opts= opts.titleopts (title=" most active %d friends "% self.top_num))) return c # Snapshot-Selenium or snapshot-phantomjs make_snapshot(driver, bar_chart().render(), "bar.png")Copy the code


3. The last

In the above operation, the topic and value of the current group chat in the past period of time can be learned from the generated word cloud, and the activity ranking of wechat group chat can be obtained through the data analysis of chat records.

Of course, diving ranking of group members and data analysis of a group member can also be analyzed.

I have uploaded all the source code to the background, follow the public account “AirPython” reply “wechat group chat” can get all the source code.

If you think the article is good, please like it and share it. Your affirmation is my biggest encouragement and support.


Recommended reading


I spent 1 minute to write a section of reptile, help the little sister liberated his hands




Who secretly deleted your wechat? Don’t panic! Python has it all figured out for you




In order to track down little sister, I made a robot in Python