This is the first day of my participation in the August Text Challenge.More challenges in August
Hello, everyone, I am the talented brother.
In fact, on Thursday (August 12), the first issue of Mango TV’s “Brother of Breaking through Thorns” was broadcast, and it burst to attack, feeling card, not greasy, texture, so that it online has gained a lot of attention!
What kind of chemical reaction will be generated if you put the young men, rockers, rappers, dancers, singers, kung fu actors and so on in the same variety show?
Today, we’ll take a look at the 100, 000 barrage army in episode 1!
1. Preview data
This collection is a total of 97,331 bullets from the first, middle and next three episodes of Mango TV, the specific collection process is shown in the code at the end of the article (relatively simple).
import pandas as pd
df = pd.read_excel('The brother who cut through the thorns. XLSX')
# Data field information
df.info()
Copy the code
<class 'pandas.core.frame.DataFrame'>
Int64Index: 97331 entries, 4 to 33794
Data columns (total 7 columns):
# Column Non-Null Count Dtype--- ------ -------------- ----- 0 ids 97331 non-null string 1 uid 97331 non-null Int64 2 content 97331 non-null string 3 Time 97331 non-null Int64 4 v2_UP_COUNT 97331 non-null Int64 5 Time 97331 non-null Int64 6 Upper, middle, and Lower 97331 Non-null string Dtypes: Int64(4), String (3) Memory Usage: 6.3MBCopy the code
In the data, the meanings of each field are as follows:
Ids: indicates the ID of the barrage
Uid: indicates the user ID
Content: bullet screen content
Time: The time when the barrage was sent (milliseconds after the episode started)
V2_up_count: indicates the number of thumbs-up points
Time: Sending time of barrage (min)
Upper, middle and Lower: belong to one of the upper, middle and lower parts of the first period
# Data preview
df.sort_values(by=['Upper middle lower'.'time'], inplace=True) Sort by upper, middle, lower, and time
df.head()
Copy the code
(
df.groupby('Upper middle lower'Agg (Projectile count =('ids'.'count'), the length = ('time'.'max')
).reset_index().style
.bar(subset='Barrage number', align='zero')
.bar(subset='how long', color='orange', align='zero'))Copy the code
Basically every episode of episode 1 will be full (360 rounds per minute)
2. The whole word cloud of bullet screen
Here, the word cloud making tool “” is used to draw
From the overall word cloud, we can find that the audience are basically in the laughter and ah-ah-ah-ah-ah of praise to finish watching.
I have to say, this variety show is very happy
If we remove these onomatopoeia words and some praise words, we can find that the Great Bay Area (mainly refers to Chen Xiaochun, Xie Hua, Lin Xiaofeng, Zhang Zhilin, Liang Hanwen and other fans), Zhao Wenzhuo, Lee Chenghyun, Ouyang Jing and Zhang Yunlong are the most popular guests among danmu users!
3. Most liked barrage
Most of the top 10 “likes” were in the episode 1, and most of them were in the part where Cho comes in and sings “Meteor Shower” as a meteor hammer, hahaha!
(
df.sort_values(by='v2_up_count', ascending=False).head(10).style
.hide_index()
.hide_columns(['ids'.'uid'.'time']))Copy the code
Praise the third “Chen Xiaochun: daughter is really good” is in the introduction of Zhao Wenzhuo’s friend to send blessing stage Zhao Wenzhuo’s son daughter’s blessing when the lens to Chen Xiaochun, and then the classic bullet screen given by netizens, very warm there is no!
“I finally know qi Wei’s happiness” must have been thought up by a female netizen. It comes from the 39-minute section when Lee Seung-hyun sang “Flying in the Sky”.
4. The craziest barrage
See there are a lot of second brush, three brush multi brush audience, how many audience is bullet screen crazy, let’s have a look!
df.groupby('uid') ['ids'].count().sort_values(ascending=False).to_frame('Barrage number').reset_index().head()
Copy the code
We can see that in the 4 and a half hours of the first episode, a netizen posted a total of 176 bullets, with an average of 0.65 bullets per minute.
Sampling 20 pieces of the netizen’s bullet screen content, we found that he really loves this program, not just for a brother!
(
df[df['uid'] = =3752327606].sample(18).style
.hide_index()
.hide_columns(subset='ids'))Copy the code
5. The hottest brothers in the barrage
Which brothers are the most popular among the 33 brothers in the first episode, which lasts for four and a half hours?
From the overall word cloud of bullet screen, it can be seen that the most keywords are The Great Bay Area (mainly referring to Chen Xiaochun, Xie Tienhua, Lin Xiaofeng, Zhang Zhilin, Liang Hanwen, zhao Wenzhuo, Lee Chenghyun, Ouyang Jing and Zhang Yunlong).
The big bay Area brothers
df[df['content'].astype('str').str.contains('big bay area | doing | brother chun | Michael tse | zhang zi Lin Lin xiaofeng | | working')]
Copy the code
Vincent zhao
df[df['content'].astype('str').str.contains('Vincent zhao | ZhuoGe | winjumper')]
Copy the code
Ha, ha, ha, ha, ha, ha
take
That, uh, Chewie’s happy Lee Seung-hyeon, is so cool
df[df['content'].astype('str').str.contains('take | I'm still alive')]
Copy the code
jin
The rapper Ouyang Jing
df[df['content'].astype('str').str.contains('Ouyang Jing')]
Copy the code
Yun-long zhang
Zhang Yunlong “Ride the Dragon fast Xu” is so sweet, Zhang Yunlong saw Yan Chengxu for the first time, excitedly came up to the other hand directly and then blurted out: “I used to imitate you!”
df[df['content'].astype('str').str.contains('yun-long zhang | yunlong')]
Copy the code
For more information about danmu or your brother’s danmu, you can reply 955 to get it in mango TV folder, and then play by yourself!
6. How does danmu evaluate this mango?
See a lot of bullets are complimenting this mango station
df[df['content'].astype('str').str.contains('mango.)]
Copy the code
7. Barrage data acquisition program
Stick a source cough up
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
}
datas = []
# Each episode is over 90 minutes, so 100 is enough!
for i in range(100) :print(f'\r{i}',end=' ')
Pay attention to the rule of interface address in each episode
url = f'https://bullet-ali.hitv.com/bullet/2021/08/17/192249/13137070/{i}.json'
r = requests.get(url, headers=headers)
if r.status_code == 200:
data = r.json()
data = data['data'] ['items']
datas.extend(data)
else:
break
df = pd.DataFrame(datas)
df = df[['ids'.'uid'.'content'.'time'.'v2_up_count']].fillna(0)
df['time'] = df.time//60000
Copy the code
Above is this all content, this variety bar or value of a look, really very recall to kill