A list,

1. These two days, I often read these contents in my moments, short videos and weibo social platforms (King of Glory Miyue white crystal skin).

2, curious I went to Baidu, found that is king of Glory recently out of a skin, see a lot of people are discussing, I also began to crawl comments, see what you think of this skin? And why it’s so hot!

3. This skin is related to the movie “A Chinese Odyssey”, and you can go to the official website for detailed introduction (link below). Today’s purpose is not to introduce skin, but to learn crawler and data analysis.

https://pvp.qq.com/coming/v2/skins/0125-bjjmy.shtml?ADTAG=pvp.skin.pcgw
Copy the code

2. Crawl comment data

Comment source Platform

B station is selected here. Related videos can be searched by keyword: King of Glory Bai Jingjing. Multiple video comments are collected here for analysis.

In order to explain how to obtain data, the videos with the most views are selected as samples to collect comments.

Collect comments

The video link is as follows:

https://www.bilibili.com/video/BV1dr4y1N7YT?from=search&seid=14952442383235795708
Copy the code

After confirming the video, I started to get the bullet screen and the comments below.

I wrote an article to analyze the bullet screen comments of B station’s “Send you a Little red flower”. If you have read this article, you know how to get the bullet screen of B station’s video. But here’s how to do it.

1. Press F12 on the video screen to view data packets

Find this packet \

https://api.bilibili.com/x/v2/dm/thumbup/stats?oid=287593212&ids=44298586098761735
Copy the code

The OID value 287593212 can be obtained

2. Get barrage comments

Station B has a special API interface for obtaining bullet screen comments. You can obtain the bullet screen comments \ of each video by changing the value of OID

https://api.bilibili.com/x/v1/dm/list.so?oid=287593212
Copy the code

In this way, we can obtain the data of bullet screen comments from netizens. The following data will be analyzed.

Programming to realize

import requests
headers = {
     'User-Agent':'the Mozilla / 5.0 (Windows NT 10.0; Win64; x64; The rv: 64.0) Gecko / 20100101 Firefox 64.0 / '
}
oidlist=['287593212'.'286279864'.'285602874'.'288005981'.'287970790'.'287621268']
for j in oidlist:
    url="https://api.bilibili.com/x/v1/dm/list.so?oid="+str(j)
    r = requests.get(url, headers=headers)
Copy the code

Here the oidList is six, which means to obtain the bullet screen comment data of six videos

with open("commit.txt","a+",encoding='utf-8') as f:
    for i in list_s:
          i = (i.split(">"))[1].replace("</d","")
          i =I.r eplace ("?" , ""). The replace ("." ,"").replace(",","").replace("+","").replace("! ,"").replace("...." ,"").replace("......." ,"") f.write(str(i)+"\n")
Copy the code

The data is captured and processed accordingly (remove web tags, punctuation? . ,…).

Finally, save it to a COMMIT text file.

Analyze review data

1. Word cloud analysis

# # #1.Def analysis1():with open("commit.txt", 'r', encoding='utf8') as f:
        st = f.read()


        #print(st)
        word_list = jieba.cut(st)
        result ="".join(word_list) # 分词用 空 间 表 示 icon_name= 'fab fa-qq'# Picture save name PICP= '1.png'
        gen_stylecloud(text=result,
                       font_path='simsun.ttc',
                       # icon_name='fas fa-envira',
                       icon_name='fas fa-cannabis',
                       max_words=100,
                       max_font_size=70,
                       output_name='icon1.png',) # must add Chinese font, otherwise the format is wrongCopy the code

Results 1

Analysis of the

In the word cloud, larger fonts mean more mentions

  1. This is the king of Glory hero, Mi Yue skin, so not corresponding keywords.
  2. Mi Yue’s white crystal skin is the image of The character Bai Jingjing in the movie “A Chinese Odyssey”. In the film, Bai Jingjing is associated with the royal treasure character, which is also the hot topic, discussion point and selling point of this skin.
  3. At the same time, the white figure is associated with the Supreme Treasure, which extends to the Purple Glow fairy related to the Yu Supreme Treasure, and sun Wukong (monkey) in King of Glory.
  4. Inside the words (I, have, have, of) and other words also show that the netizen for this skin topic a lot of.

2. Emotional analysis

Emotion analysis we can use snownLP library to solve problems \

Import the appropriate library
from snownlp import SnowNLP
Copy the code

Each comment can be rated on a scale of 0 to 1, with 0 to 0.5 being negative and 0.5 to 1 positive. \

No more nonsense, direct combat!

Here also go to the top 5 to see the effect

with open("commit.txt", 'r', encoding='utf8') as f:
   data = f.readlines()


for i in data[0:5]:
   sentiments = SnowNLP(i).sentiments
   print(i)
   print(sentiments)
   print("--------------------")
Copy the code

Results 2

Analysis of the

You can see that the snownLP library provides a reasonable analysis method.

Average score of sentiment analysis

Such a score, so we can be corresponding to all the comments for scoring, and then seek the average score, so that you can probably understand, the total sexual feelings of the net friends is negative or positive. 支那

 with open("commit.txt", 'r', encoding='utf8') as f:
     data = f.readlines()


 sum=0
 for i in data:
     sentiments = SnowNLP(i).sentiments
     #print(i)
     sum = sum+# # sentiments print (sentiments) print (" -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- ") print (" scored an average sentiment analysis:"+str(sum/len(data)))
Copy the code

The effect of 3

Analysis of the

  1. The overall score is 0.5713098907113867, greater than 0.5, indicating that the overall trend is positive.
  2. It can also be seen from the side that netizens still like this skin, but also for the corresponding “A Chinese Odyssey” in the role of Zhi Zunbao, Zixia, White jingjing and other positive topics.

Four,

  1. Reviewed again how to obtain the method of B station bullet screen comment
  2. Through two kinds of analysis methods: word cloud and emotion analysis, we can analyze the comments of netizens, which is helpful to grasp the public opinion and emotion of a certain thing (such as the white skin of this article).
  3. The analysis of comments is no longer limited to visualization such as word cloud, but can also control the comment information of netizens through emotional means.

Read more

Top 10 Best Popular Python Libraries of 2020 \

2020 Python Chinese Community Top 10 Articles \

5 minutes to quickly master the Python timed task framework \