Because of the pandemic, holidays are getting longer and longer, and I haven’t seen anyone for a long time.
Think of valentine’s Day is coming, sleep and wake up to think that according to wechat chat records to do a word cloud show love, talk about love. But then I thought, since I’ve downloaded my wechat chat records, why not further analyze the data? The goal of the analysis then becomes:
- Word cloud generation;
- Simple text analysis of chat logs.
1. Data acquisition
Because this part is not the main content, you can directly baidu related content, such as the article “Android crack wechat database to get chat records”. For Android, the main points are as follows:
- Obtain the root permission of your mobile phone, or download an Android emulator, check “root permission”;
- In the mobile phone or simulator “/ data/data/com. Tencent. Mm/MicroMsg” path to find a name a long list of folder red box (pictured), find a place called EnMicrioMsg. Db database file, which stores the chat record.
- Use MD5 to decrypt the string “IMEI (mobile phone serial number) UIN (user information Number)”, select 32 lowercase digits, and the first 7 digits are the database password.
- Using sqlcipher (link: pan.baidu.com/s/1Im3n02ys… Extract code: ka4z) read open enmicriomsg. db and convert it to CSV format.
After getting the CSV file and using Excel to open it, there are more than 13,000 chat records with my wife. Wechat is really the core of long-distance relationship. After data analysis, it is found that type is the message type. For example, type = 1 refers to ordinary text message, and type = 3 refers to voice information. IsSend indicates who sends the message. IsSend = 0 indicates that the other party sends the message, and 1 indicates that our party sends the message. CreateTime is the timestamp of the message, and Content is the content of the message. Text analysis of chat content is mainly based on these four columns of data.
After a brief cleanup, the CSV data is imported, and the entire analysis process is done using PANDAS.
import pandas as pd
data = pd.read_csv('data.csv')
Copy the code
Second, the word cloud is born
To generate a wordcloud using python, you need to call wordcloud, or install it if you don’t have it installed.
pip install wordcloud
Copy the code
The process of word cloud generation is not difficult. First, jieba algorithm is used to divide words into each chat record. Of course, before dividing words, we must import our own stop words and phrases, and store the words obtained each time into the all_words array. After the word segmentation is complete, the word frequency is sorted in pandas using a method similar to PivotTable and sorting in Excel. The sorted word frequency is exported to CSV. The key four lines of code are as follows:
df = pd.DataFrame({'word': all_words})
words_count = df.groupby(by=['word'[])'word'].agg({"Count": np.size})
words_count = words_count.reset_index().sort_values(by="Count", ascending=False)
words_count.to_csv('word. CSV', encoding='utf-8')
Copy the code
CSV, a total of 14 words with a frequency greater than 100. In addition to nicknames, “hahahaha” and “hahahaha” occupy the other two positions in the top five.
In order to see the word frequency distribution more intuitively, the following code is used to generate the word cloud.
loveImg = imageio.imread('love.png')
wordcloud = WordCloud(background_color='white',
mask=bimg, font_path='simhei.ttf')
words = words_count.set_index("word").to_dict()
wordcloud = wordcloud.fit_words(words["Count"])
bimgColors = ImageColorGenerator(loveImg)
plt.axis("off")
plt.imshow(wordcloud.recolor(color_func=bimgColors))
plt.show()
Copy the code
B: well… The content is similar to the meaning presented in the table, but there is no more interpretation, I myself are sour, too sweet bar…… Feed yourself lemons series…
Third, chat text analysis
In the previous section, in order to generate the word cloud, you need to code out the emoticons sent. During the text analysis, the expressions in this part were sorted out and counted up, and it was found that the top 10 system expressions (sorted from left to right) sent by my wife and I were as follows. “Happy” is the most common expression
When the time of chatting was counted up, it was found that the peak time of chatting was around 12 noon and 10 PM, indicating that we were busy during the day… There’s a small peak around 12:30 p.m., which fits in nicely with our bedtime.
Using the method mentioned in “Text Analysis of my Own Articles with Python”, treat each day as an “article”, use TextRank (tF-IDF is not effective) to get the keywords of each day, and then generate the coword matrix, as shown in the following figure.
All the topics about “wife” two words, involving news events at the time (for example: outbreak, Iran, etc.), work (such as opening, parking, etc.), learning (e.g., education, job, etc.), life (such as eating, rob tickets, etc.), communication (hug, happy, etc.), which is very chat with temperature. Data analysis can not determine feelings, only reflect feelings from one side. It turns out that our relationships are positive and moving forward
Public account recently recommended reading: \
GAN is 6 years old! It’s time to stroke him!
Hundreds of GAN papers have been downloaded! With a recent overview of generative adversarial networks! \
A bit exaggerated, a bit twisted! See how these Gans exaggerate and caricature human faces! \
It rained from heaven, but nothing for me! How about GAN for rain? \
Face to face! GAN can make the side face killer, peppa pig true face no escape? \
Losing face! GAN to predict? \
Weak water three thousand, only take your mark! How about AL (Active Learning) and GAN? \
Anomaly detection, GAN how GAN?
Virtual change of clothes! Quick overview of these latest papers how to do! \
Facial makeup migration! Take a quick look at a few papers using GAN
[1] How about the generation of GAN in medical images?
01-GAN formula concise principle of ironclad small treasure
GAN&CV communication Group, no matter small white or big guy, sincerely invite you to join! \
Discuss and exchange! Long press note to join:
Share more, long click to follow our public account: \