Everybody, I am K student!
Recently, the movie Changjin Lake has been very popular in the circle of friends. We don’t want to go to the cinema to grab a couple’s seat, so let’s honestly analyze the reviews and see what we think of itFirst locate the target page
“
Movie.douban.com/subject/258…
“
Crawler, grab the following four fields
Then use PANDAS to import the data and do simple processing
import pandas as pd
import os
file_path = os.path.join("douban.csv")
Select * from test.csv; select * from test.csv; select * from test.csv;
df = pd.read_csv(open(file_path,'r',encoding='utf-8'), names=["Username"."Star rating"."Comment time"."Comment"])
df.head()
Copy the code
Value_counts () STAR_num = star_num.sort_index() star_numCopy the code
37 Name: star rating, dtype: int64Copy the code
Douban short review score ratio
from pyecharts.charts import Pie, Bar, Line, Page
from pyecharts import options as opts
from pyecharts.globals import SymbolType
# data for
data_pair = [list(z) for z in zip([i for i in star_num.index], star_num.values.tolist())]
# the pie chart
pie1 = Pie(init_opts=opts.InitOpts(width='800px', height='400px'))
pie1.add(' ', data_pair, radius=['35%'.'60%'])
pie1.set_global_opts(title_opts=opts.TitleOpts(title=Percentage of Douban short comments),
legend_opts=opts.LegendOpts(orient='vertical', pos_top='15%', pos_left='2%')
)
pie1.set_series_opts(label_opts=opts.LabelOpts(formatter='{b}:{d}%'))
pie1.render_notebook()
Copy the code
Insert a picture description here
Chart of comments
# line chart
line1 = Line(init_opts=opts.InitOpts(width='800px', height='400px'))
line1.add_xaxis(comment_date.index.tolist())
line1.add_yaxis(' ', comment_date.values.tolist(),
# areastyle_opts = opts. AreaStyleOpts (opacity = 0.5),
label_opts=opts.LabelOpts(is_show=False))
line1.set_global_opts(title_opts=opts.TitleOpts(title='Chart of comment volume'),
# toolbox_opts=opts.ToolboxOpts(),
visualmap_opts=opts.VisualMapOpts(max_=140))
line1.set_series_opts(linestyle_opts=opts.LineStyleOpts(width=4))
line1.render_notebook()
Copy the code
It came out on September 30th, it started building momentum on September 29th, it peaked on the 30th, but it seems to have lost momentum on the 1st.
Word cloud
positive
import jieba
def get_cut_words(content_series) :
Read the stop words table
stop_words = []
with open(r"hit_stopwords.txt".'r', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
stop_words.append(line.strip())
# add keywords
my_words = ['Chosin Lake'.'Volunteers']
for i in my_words:
jieba.add_word(i)
# Custom stop words
my_stop_words = ['movie'."Chosin Lake"."War"]
stop_words.extend(my_stop_words)
# participle
word_num = jieba.lcut(content_series.str.cat(sep='. '), cut_all=False)
# Conditional filter
word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
return word_num_selected
Copy the code
Text1 = get_cut_words (content_series = df [(df) star rating = = 'strongly recommended') | (df) star rating = = 'recommended')] [' comments']) text1 [5]Copy the code
['sacrifice'.'ice'.'soldiers'.'should'.'forget']
Copy the code
import stylecloud
from IPython.display import Image # Used to display local pictures in JUPyter Lab
# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text1),
max_words=1000,
collocations=False,
font_path=R 'Classic variety style brief. TTF',
icon_name='fas fa-thumbs-up',
size=360,
output_name='Douban positive rating word cloud image. PNG')
Image(filename='Douban positive rating word cloud image. PNG')
Copy the code
negative
Text2 = get_cut_words(content_series=df[(df'还行') | (df) star rating = ='poor'] ['comments'])
text2[:5]
Copy the code
[' a bit ', 'disappointment ',' plot ', 'business as usual ',' characters ']Copy the code
# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text2),
max_words=1000,
collocations=False,
font_path=R 'Classic variety style brief. TTF',
icon_name='fas fa-thumbs-down',
size=350,
output_name='Douban negative rating word cloud image. PNG')
Image(filename='Douban negative rating word cloud image. PNG')
Copy the code