Everybody, I am K student!

Recently, the movie Changjin Lake has been very popular in the circle of friends. We don’t want to go to the cinema to grab a couple’s seat, so let’s honestly analyze the reviews and see what we think of itFirst locate the target page

“

Movie.douban.com/subject/258…

“

Crawler, grab the following four fields

Then use PANDAS to import the data and do simple processing

import pandas as pd
import os

file_path = os.path.join("douban.csv")

Select * from test.csv; select * from test.csv; select * from test.csv;
df = pd.read_csv(open(file_path,'r',encoding='utf-8'), names=["Username"."Star rating"."Comment time"."Comment"])
df.head()
Copy the code

Value_counts () STAR_num = star_num.sort_index() star_numCopy the code

37 Name: star rating, dtype: int64Copy the code

Douban short review score ratio

from pyecharts.charts import Pie, Bar, Line, Page
from pyecharts import options as opts 
from pyecharts.globals import SymbolType

# data for
data_pair = [list(z) for z in zip([i for i in star_num.index], star_num.values.tolist())]

# the pie chart
pie1 = Pie(init_opts=opts.InitOpts(width='800px', height='400px'))
pie1.add(' ', data_pair, radius=['35%'.'60%'])
pie1.set_global_opts(title_opts=opts.TitleOpts(title=Percentage of Douban short comments), 
                     legend_opts=opts.LegendOpts(orient='vertical', pos_top='15%', pos_left='2%')
                    ) 
pie1.set_series_opts(label_opts=opts.LabelOpts(formatter='{b}:{d}%'))
pie1.render_notebook()
Copy the code

Insert a picture description here

Chart of comments

# line chart
line1 = Line(init_opts=opts.InitOpts(width='800px', height='400px'))
line1.add_xaxis(comment_date.index.tolist())
line1.add_yaxis(' ', comment_date.values.tolist(),
                # areastyle_opts = opts. AreaStyleOpts (opacity = 0.5),
                label_opts=opts.LabelOpts(is_show=False))
line1.set_global_opts(title_opts=opts.TitleOpts(title='Chart of comment volume'), 
# toolbox_opts=opts.ToolboxOpts(),
                      visualmap_opts=opts.VisualMapOpts(max_=140))
line1.set_series_opts(linestyle_opts=opts.LineStyleOpts(width=4))
line1.render_notebook()
Copy the code

It came out on September 30th, it started building momentum on September 29th, it peaked on the 30th, but it seems to have lost momentum on the 1st.

Word cloud

positive

import jieba

def get_cut_words(content_series) :
    Read the stop words table
    stop_words = [] 
    
    with open(r"hit_stopwords.txt".'r', encoding='utf-8') as f:
        lines = f.readlines()
        for line in lines:
            stop_words.append(line.strip())

    # add keywords
    my_words = ['Chosin Lake'.'Volunteers']  
    for i in my_words:
        jieba.add_word(i) 

# Custom stop words
    my_stop_words = ['movie'."Chosin Lake"."War"] 
    stop_words.extend(my_stop_words)               

    # participle
    word_num = jieba.lcut(content_series.str.cat(sep='. '), cut_all=False)

    # Conditional filter
    word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]
    
    return word_num_selected
Copy the code

Text1 = get_cut_words (content_series = df [(df) star rating = = 'strongly recommended') | (df) star rating = = 'recommended')] [' comments']) text1 [5]Copy the code

['sacrifice'.'ice'.'soldiers'.'should'.'forget']
Copy the code

import stylecloud
from IPython.display import Image # Used to display local pictures in JUPyter Lab



# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text1), 
                          max_words=1000,
                          collocations=False,
                          font_path=R 'Classic variety style brief. TTF',
                          icon_name='fas fa-thumbs-up',
                          size=360,
                          output_name='Douban positive rating word cloud image. PNG')

Image(filename='Douban positive rating word cloud image. PNG') 
Copy the code

negative

Text2 = get_cut_words(content_series=df[(df'还行') | (df) star rating = ='poor'] ['comments'])
text2[:5]
Copy the code

[' a bit ', 'disappointment ',' plot ', 'business as usual ',' characters ']Copy the code

# Draw word cloud
stylecloud.gen_stylecloud(text=' '.join(text2), 
                          max_words=1000,
                          collocations=False,
                          font_path=R 'Classic variety style brief. TTF',
                          icon_name='fas fa-thumbs-down',
                          size=350,
                          output_name='Douban negative rating word cloud image. PNG')
Image(filename='Douban negative rating word cloud image. PNG') 
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Grab the comments of Changjin Lake on Douban and I found these

Douban short review score ratio

Chart of comments

Word cloud

positive

negative

Grab the comments of Changjin Lake on Douban and I found these

Douban short review score ratio

Chart of comments

Word cloud

positive

negative

Related Posts

SVR data prediction based on MATLAB EMD optimization

Urllib crawler with unzip read

[Image counting] Based on MATLAB gray binarization similar object simple counting