Public account article link
The second sentiment analysis, mainly through Python qq group message analysis, analysis of the total number of group members speaking, group members mood comparison, a single group member’s speech cloud status and a single student’s speech sentiment trend. Use the library:
- Re re, Matplotlib, WordCloud, Numpy, jieba participle, Snownlp emotion analysis.
- The libraries above are simple to use and easy to install, so you don’t need to worry about thresholds.
It is best to combine with the first article or read the first article in detail to describe the module production process:
- Regex matches text content. The first article only considers the display of total text, and does not consider the group of group members. So use dictionary {} to tell students’ QQ/mailbox and nickname to save. And a transcript of his remarks. And one of the things that’s important here is that his nickname may have changed from, you know, your group messages show his nickname at that time, he may have changed from big speaker to blah blah blah. So my idea is mainly according to qq/ mailbox this point to determine the unique, and the nickname only appears for the first time. The text adds first.
value={}
def analyseinformation(lines):
qqnow=' '#qq or email current user
for line in lines:
ifline ! ="\n" andline.strip() ! ="\n" andline ! =None and not line.__contains__("Withdrawn."):
line = line.replace("[expression]"."").replace("@ All members"."").replace("[expression]"."").\
replace("[QQ red envelope] I sent a "exclusive red envelope", please use the new version of mobile QQ check red.."").replace("\n"."").replace("[images]".' ')
if pattern.search(line):# Match the correct object
# print(line)
if pattern3.search(line):
qq1=str(pattern3.search(line).group(3))
namenow=str(pattern3.search(line).group(1))
if not qq1 in value.keys():
value[qq1]={'name':namenow,'qq':qq1,'text':[]}
qqnow=qq1The current user's address has changed
elif pattern4.search(line):
email=str(pattern4.search(line).group(2))
namenow=str(pattern4.search(line).group(1))
if not email in value.keys():
value[email]={'name':namenow,'qq': email,'text':[]}
qqnow=email
# print(name)
elif not qqnow.__eq__(' ') :# init pit, initialized to '', first few lines without text filter directly
value[qqnow]['text'].append(str(line))
Copy the code
- I said that all the useful information goes into the dictionary value, but I want to store some information in small arrays. It can be treated a little bit:
time=[]# number
text=[]# text
name=[]# the name
qq=[]# QQ or email extraction
def getmotion(values):
for key in values:
print(values[key])
time.append(values[key]['text'].__len__())
usertxt=' '
for txt in values[key]['text']:
usertxt+=txt+' '
text.append(usertxt)
name.append(values[key]['name'])
qq.append(key)
Copy the code
- Now you can finish the part you want to show. First, I want to check how many times each person has spoken in that time. Since matplotlib graphics don’t show much, I’ll show some of them (you can change them yourself, it’s just a matter of effect), you can also customize the sort and then display:
# Show the number of speeches of each student
def getspeaktimeall(time,name):
Xi = np.array(time[20:50])We need to change the scope according to our presentation needs. There are too many people in our group
Yi = np.array(name[20:50])
x=np.arange(0.30.1)
width=0.6
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(8.6)) ## Specify image ratio: 8:6
plt.barh(x , Xi, width, color='SkyBlue',alpha=0.8)
plt.xlabel("time")
plt.ylabel("name")
for a,b,c in zip(Xi,Yi,x):
print(a,b,c)
plt.text(a+10,c0.4.'%d'%int(a),ha='center',va='bottom')
plt.yticks(x,Yi)
# plt.legend()
plt.show()
plt.close()
Copy the code
- I would also like to see the overall sentiment comparison of all the speeches. So I will make all the students’ statements into a big string, and then use SnownLP to analyze, snownLP API is very simple. S =SnowNLP(text) print(s.entiments), this part of the pit is the display of the label I recorded before can be combined to look at, I won’t go into details. The detailed code of this part is:
def getemotionall(time,text,name,qq):
emotion=[]
for i in range(0,len(qq)):
print(name[i],text[i])
s=SnowNLP(text[i])
emotion.append(s.sentiments*100)
print(len(name),len(emotion))
Xi = np.array(emotion[10:40])
Yi = np.array(name[10:40])
x = np.arange(0.30.1)
width = 0.6
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(8.6)) ## Specify image ratio: 8:6
plt.barh(x, Xi, width, color='red',label='The total emotion of the speech', alpha=0.8)
plt.xlabel("emotion")
plt.ylabel("name")
for a, b, c in zip(Xi, Yi, x):
print(a, b, c)
plt.text(a + 2, c - 0.4.'%d' % int(a), ha='center', va='bottom')
plt.yticks(x, Yi)
# plt.legend()
plt.show()
plt.close()
Copy the code
- Next I want to analyze each person’s speech cloud, this part of the first article also talked about the implementation of the way, the code will stay in the whole post. You can check the chats of people you hate, people you like, or two people. See what she cares about. Hey hey ðŸ¤.
- The other thing I want to look at is the emotional direction of each person, which is also useful, you can analyze her or his recent emotional direction, and if you integrate the least square prediction and you can draw a predicted emotional direction, I won’t draw it here. I’m using a line chart, 1 for positive, 0 for negative, 0.5 for neutral. Because some people speak too much, which is not conducive to the icon display, I only went to his latest 200 records, not based on the time, you can also integrate the time if you are interested. The core code is:
def getemotionbyqq(value,qq):
va=value[qq]['text']
emotion=[]
for q in va[len(va)- 200.:len(va)]:
s = SnowNLP(q)
emotion.append(s.sentiments)
#print(s.sentiments)
x=np.arange(len(emotion))
y=np.array(emotion)
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(12.6)) ## Specify image ratio: 8:6
plt.plot(x,y,label='emotion status')
plt.xlabel("Sentiment Trends in the last 200 Speeches")
plt.ylabel("0-1 is negative negative positive.")
plt.legend()
plt.show()
Copy the code
Attach code and display:
import re
from snownlp import SnowNLP
import numpy as np
import matplotlib.pyplot as plt # # drawing library
from wordcloud import WordCloud
import jieba.analyse
time=[]# number
text=[]# text
name=[]# the name
qq=[]# QQ or email extraction
value={}
pattern=re.compile(r'(\d*)-(\d*)-(\d*) .* .*')2018-05-07 13:48:39 2XXX
2018-05-07 13:48:39 2XXX
@qq.com>
Running the # pattern2=re.com (r '(\ d +) : (\ d +) : \ d +') # matching 15:55:40
pattern3=re.compile(r'(\S+)(\()(.*?) (\)) ')Class 2 (1315426911
pattern4=re.compile(r'(\S+)[<](.*)[>]')
def getemotionbyqq(value,qq):
va=value[qq]['text']
emotion=[]
for q in va[len(va)- 200.:len(va)]:
s = SnowNLP(q)
emotion.append(s.sentiments)
#print(s.sentiments)
x=np.arange(len(emotion))
y=np.array(emotion)
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(12.6)) ## Specify image ratio: 8:6
plt.plot(x,y,label='emotion status')
plt.xlabel("Sentiment Trends in the last 200 Speeches")
plt.ylabel("0-1 is negative negative positive.")
plt.legend()
plt.show()
def getstudentcloudbyqq(value,qq):
va=value[qq]['text']
text=' '
for q in va:
text+=q+' '
print(text)
ags = jieba.analyse.extract_tags(text, topK=40)
text=' '.join(ags)
wc = WordCloud(background_color="white",
width=1500, height=1000,
min_font_size=40,
font_path="simhei.ttf",
max_font_size=300.Set the maximum font size
random_state=40.# set how many randomly generated states there are, i.e. how many color schemes there are
) There is a pit in the # font, be sure to set this parameter. Otherwise it will display a bunch of small boxes wc.font_path="simhei.ttf" # bold
# wc.font_path="simhei.ttf"
my_wordcloud = wc.generate(text)
plt.imshow(my_wordcloud)
plt.axis("off")
plt.show()
plt.close()
def getemotionall(time,text,name,qq):
emotion=[]
for i in range(0,len(qq)):
print(name[i],text[i])
s=SnowNLP(text[i])
emotion.append(s.sentiments*100)
print(len(name),len(emotion))
Xi = np.array(emotion[10:40])
Yi = np.array(name[10:40])
x = np.arange(0.30.1)
width = 0.6
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(8.6)) ## Specify image ratio: 8:6
plt.barh(x, Xi, width, color='red',label='The total emotion of the speech', alpha=0.8)
plt.xlabel("emotion")
plt.ylabel("name")
for a, b, c in zip(Xi, Yi, x):
print(a, b, c)
plt.text(a + 2, c - 0.4.'%d' % int(a), ha='center', va='bottom')
plt.yticks(x, Yi)
# plt.legend()
plt.show()
plt.close()
# Show the number of speeches of each student
def getspeaktimeall(time,name):
Xi = np.array(time[20:50])We need to change the scope according to our presentation needs. There are too many people in our group
Yi = np.array(name[20:50])
x=np.arange(0.30.1)
width=0.6
plt.rcParams['font.sans-serif'] = ['SimHei'] # used to display Chinese labels normally
plt.figure(figsize=(8.6)) ## Specify image ratio: 8:6
plt.barh(x , Xi, width, color='SkyBlue',alpha=0.8)
plt.xlabel("time")
plt.ylabel("name")
for a,b,c in zip(Xi,Yi,x):
print(a,b,c)
plt.text(a+10,c0.4.'%d'%int(a),ha='center',va='bottom')
plt.yticks(x,Yi)
# plt.legend()
plt.show()
plt.close()
def getmotion(values):
for key in values:
print(values[key])
time.append(values[key]['text'].__len__())
usertxt=' '
for txt in values[key]['text']:
usertxt+=txt+' '
text.append(usertxt)
name.append(values[key]['name'])
qq.append(key)
#getmatplotlibtime(time,text,name,qq)
# getmatplotlibemotion(time,text,name,qq)
# print(time)
def analyseinformation(lines):
qqnow=' '#qq or email current user
for line in lines:
ifline ! ="\n" andline.strip() ! ="\n" andline ! =None and not line.__contains__("Withdrawn."):
line = line.replace("[expression]"."").replace("@ All members"."").replace("[expression]"."").\
replace("[QQ red envelope] I sent a "exclusive red envelope", please use the new version of mobile QQ check red.."").replace("\n"."").replace("[images]".' ')
if pattern.search(line):# Match the correct object
# print(line)
if pattern3.search(line):
qq1=str(pattern3.search(line).group(3))
namenow=str(pattern3.search(line).group(1))
if not qq1 in value.keys():
value[qq1]={'name':namenow,'qq':qq1,'text':[]}
qqnow=qq1The current user's address has changed
elif pattern4.search(line):
email=str(pattern4.search(line).group(2))
namenow=str(pattern4.search(line).group(1))
if not email in value.keys():
value[email]={'name':namenow,'qq': email,'text':[]}
qqnow=email
# print(name)
elif not qqnow.__eq__(' ') :# init pit, initialized to '', first few lines without text filter directly
value[qqnow]['text'].append(str(line))
# print(name)
#print(value[name])
if __name__ == '__main__':
f = open('E:/text.txt'.'r', encoding='utf-8') # text file for word segmentation (all as utF8 file, save trouble)
lines = f.readlines()
Value {}
analyseinformation(lines)
getmotion(value)This function takes some values from the name[] array
# Core analysis functions:
getspeaktimeall(time,name)# The number of students speaking in the selected range
getemotionall(time,text,name,qq)
getstudentcloudbyqq(value,'694459644')
getemotionbyqq(value,'694459644')
Copy the code
- Number of speeches (I chose a small space for presentation)
- Overall emotional comparison (running slower because there are more)
- I secretly chose the monitor who spoke more.
- Individual emotional trend (I secretly analyzed the monitor’s ðŸ¤)
You can see this guy hasn’t been speaking well the last few times.
In fact, there is a pity, because SnownLP trains on the positive and negative corpus of product reviews, some places are not accurate. I hope I have the opportunity to do a package of emotional analysis. Well, it’s a long way to go.
Level priority, quite like this, the program is not concise, if there is improvement, big guy please point out!