Everybody, I am K student!

Recently, the movie Changjin Lake has been very popular in the circle of friends. We don’t want to go to the cinema to grab a couple’s seat, so let’s honestly analyze the reviews and see what we think of itFirst locate the target page

Movie.douban.com/subject/258…

Crawler, grab the following four fields

Then use PANDAS to import the data and do simple processing

import pandas as pd
import os

file_path = os.path.join("douban.csv")

Select * from test.csv; select * from test.csv; select * from test.csv;
df = pd.read_csv(open(file_path,'r',encoding='utf-8'), names=["Username"."Star rating"."Comment time"."Comment"])
df.head()
Copy the code

Value_counts () STAR_num = star_num.sort_index() star_numCopy the code
37 Name: star rating, dtype: int64Copy the code

Douban short review score ratio

from pyecharts.charts import Pie, Bar, Line, Page
from pyecharts import options as opts 
from pyecharts.globals import SymbolType

# data for
data_pair = [list(z) for z in zip([i for i in star_num.index], star_num.values.tolist())]

# the pie chart
pie1 = Pie(init_opts=opts.InitOpts(width='800px', height='400px'))
pie1.add(' ', data_pair, radius=['35%'.'60%'])
pie1.set_global_opts(title_opts=opts.TitleOpts(title=Percentage of Douban short comments), 
                     legend_opts=opts.LegendOpts(orient='vertical', pos_top='15%', pos_left='2%')
                    ) 
pie1.set_series_opts(label_opts=opts.LabelOpts(formatter='{b}:{d}%'))
pie1.render_notebook()
Copy the code

Insert a picture description here

Chart of comments

# line chart
line1 = Line(init_opts=opts.InitOpts(width='800px', height='400px'))
line1.add_xaxis(comment_date.index.tolist())
line1.add_yaxis(' ', comment_date.values.tolist(),
                # areastyle_opts = opts. AreaStyleOpts (opacity = 0.5),
                label_opts=opts.LabelOpts(is_show=False))
line1.set_global_opts(title_opts=opts.TitleOpts(title='Chart of comment volume'), 
# toolbox_opts=opts.ToolboxOpts(),
                      visualmap_opts=opts.VisualMapOpts(max_=140))
line1.set_series_opts(linestyle_opts=opts.LineStyleOpts(width=4))
line1.render_notebook()
Copy the code

It came out on September 30th, it started building momentum on September 29th, it peaked on the 30th, but it seems to have lost momentum on the 1st.

Word cloud

positive

import jieba

def get_cut_words(content_series) :
    Read the stop words table
    stop_words = [] 
    
    with open(r"hit_stopwords.txt".'r', encoding='utf-8'as f:
        lines = f.readlines()
        for line in lines:
            stop_words.append(line.strip())

    # add keywords
    my_words = ['Chosin Lake'.'Volunteers']  
    for i in my_words:
        jieba.add_word(i) 

# Custom stop words
    my_stop_words = ['movie'."Chosin Lake"."War"] 
    stop_words.extend(my_stop_words)               

    # participle
    word_num = jieba.lcut(content_series.str.cat(sep='. '), cut_all=False)

    # Conditional filter
    word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
    
    return word_num_selected
Copy the code
Text1 = get_cut_words (content_series = df [(df) star rating = = 'strongly recommended') | (df) star rating = = 'recommended')] [' comments']) text1 [5]Copy the code
['sacrifice'.'ice'.'soldiers'.'should'.'forget']
Copy the code
import stylecloud
from IPython.display import Image # Used to display local pictures in JUPyter Lab



# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text1), 
                          max_words=1000,
                          collocations=False,
                          font_path=R 'Classic variety style brief. TTF',
                          icon_name='fas fa-thumbs-up',
                          size=360,
                          output_name='Douban positive rating word cloud image. PNG')

Image(filename='Douban positive rating word cloud image. PNG'Copy the code

negative

Text2 = get_cut_words(content_series=df[(df'还行') | (df) star rating = ='poor'] ['comments'])
text2[:5]
Copy the code
[' a bit ', 'disappointment ',' plot ', 'business as usual ',' characters ']Copy the code
# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text2), 
                          max_words=1000,
                          collocations=False,
                          font_path=R 'Classic variety style brief. TTF',
                          icon_name='fas fa-thumbs-down',
                          size=350,
                          output_name='Douban negative rating word cloud image. PNG')
Image(filename='Douban negative rating word cloud image. PNG'Copy the code