Writing in the front

Today I’m going to check out the hot spot, and I’m going to use Python and some data analysis to prove that Aquaman is good.

“Sea king” take you back to a film “dragon training” transformers “Star Wars” starship troopers “ender’s game” iron soldiers “alien may also draw lessons from the opponent’s” iron man “and” black panther “plot, and a little to do with the taste of the big Chinese flowering crabapple, deserve to go up a diu diu James wan type horror film, the outstanding commercial, I think it’s DC of the year. Wan Ziren, an excellent synthesizer.

It opened four days and made 740 million yuan

Before doing data analysis, we need to clean the data and process the data to the best

When you read the data, when you read the data, remember, header=None and then add names

# Read data
def get_data(a):
    df = pd.read_csv("haiwang.csv",sep=",",header=None,names=["nickName"."cityName"."content"."approve"."reply"."startTime"."avatarurl"."score"],encoding="utf-8")
    return df
Copy the code

Data cleaning

  1. To check whether the data is duplicate, usedrop_duplicatesDelete the data
  2. After deleting the data, you need to reset the indexreset_index
  3. The processing time field isdatetimetype
  4. Add a field forcontent_lengthCheck the number of user reviews
# Clean data
def clean_data(a):
    df = get_data()
    has_copy = any(df.duplicated())
    data_duplicated = df.duplicated().value_counts()
    # Print (data_duplicated) # Check how much data is duplicated
    data = df.drop_duplicates(keep="first")  # Delete duplicate values
    data = data.reset_index(drop=True)  # Reset index
    data["startTime"] = pd.to_datetime(data["startTime"])
    data["content_length"] = data["content"].apply(len)
    Print (data.isnull().any()
    # print(data[data.isnull().values == True])
    # print(data[data.nickName=="."])
    return data
Copy the code

View basic data

View all data

The highest number of likes is 2,783, which is very high, but the average is only 0.25, so people on The cat’s eye are not very happy with it. The highest number of likes is only 43, which is very low

Take a look at some of the most liked statistics

The Phantom XL got the most likes. Check out his comments. Well, the writing is really good ~, praised the director, this place appeared a little oversight, I saw the cat’s eye point praise highest, I did not capture the data, should be to miss out, oversight!

Thumb up ranking

Reply ranking

# View basic data
def analysis1(a):
    data = clean_data()
    print(data.describe())
    # to delete.
    # need_delete = data[data["nickName"]=="."]
    data = data[~(data['nickName'] = =".")]
    # data = data[~data['nickName'].isin(["."])]
    # data.drop(need_delete,axis=1,inplace=True)

    print(data["nickName"].describe())
    print(data["cityName"].describe())
Copy the code

NickName description check the description for nickName. We need to filter out the data, but the most watched city here is Beijing

Data = data[~(data[‘nickName’]==”.”)]

Count 57838 unique 55934 Top qzuser freq 57 Name: nickName, dtype: object

The old iron really sent a lot of alas, a strong kua this movie, O(∩_∩)O haha ~

See the score

5 points ahead, we’ve got tickets for the weekend and we’re ready to go

The implementation of the chart was using Pychats, the official documentation is in

Pyecharts.org/#/zh-cn/pre…

You can go to the documentation to see the detailed parameter Settings

# Analyze the score situation
def analysis2(a):
    data = clean_data()
    grouped = data.groupby(by="score") ["nickName"].size()
    grouped = grouped.sort_values(ascending=False)
    index = grouped.index
    values = grouped.values
    # histogram
    bar = Bar("Bar chart",title_pos="left",width=240)
    bar.add("",index,values,is_label_show=True,is_legend_show=True,mark_line=["min"."max"])

    # Pie chart
    pie = Pie("Pie",title_pos="right",width=240)
    pie.add("",index,values,radius=[45.65],center=[70.50],is_label_show=True,legend_pos="90%",legend_orient= "vertical")

    grid = Grid(page_title="Aquaman score details",width=1200,height=500)
    grid.add(bar,grid_right="50%")
    grid.add(pie, grid_left="70%")

    grid.render("html/score.html")
    print(data)
    print(data[data["score"] = =0])

Copy the code

See the comments

def analysis3(a):
    data = clean_data()
    sort_data = data.sort_values(by="content_length",ascending=False)
    print(sort_data.head(10) ["content"])

Copy the code

Excerpt a few, can have a look, by the way I can also learn the language

The story and special effects are excellent in science fiction. The story continues with Poseidon’s trident of black iron being unrolled, and the story of The sea king in this film makes power a legend, he has the ability of the Atlantean king that no other Atlantean has. Unlike many other heroes in the comics, Aquaman is not disgusted with killing and even has a thirst for blood, which makes his appearance and personality difficult to like. Wan adapted Aquaman’s character to a certain extent, weakening his cruelty and highlighting his inferiority. The whole “Aquaman” appears to be a battle for the throne, but it is actually a growing history of Arthur overcoming inferiority, and the trident is a very important turning point in his character.

“James wan really badly in the lens of the movie is really a kung fu, and sound effects on timely ringing sound combination of pictures, in the film are points, the direction of the films to see the growth, is a hero a hero of the way also has a lively warmth, adept at pat horror thriller genre, he can read horror on some lens processing, With best terrorist elements firmly grasp the audience’s attention, but I can’t destroy the movie nudges integral atmosphere, the degree of master is nice, DC this time please James wan fencing is doing the right thing, and effects of the underwater world is to force, on the story there are several points of the design is very spiritual, I think than venom good-looking, Because venom in front of the murder of the thriller picture scared me, of course, the film is not without shortcomings, such as a few places I think the pace of the process is too fast, plot routines are easy to guess and so on, but after all looking at the whole film, I give four words – – the flaws cannot outweigh the defects.”

Effects can also, but the plot really finely, in terms of my trough point, full resulted in sea net’s mother died because of giving birth to Neptune, didn’t also the sea king sent someone to kill him in childhood, om because people ambush land after they decided to start a war, it is possible that the om plot, but the bottom of the sea it is true that people were all destroyed) results the eyebrow, As om childhood friends betray om, went to the sea king, also don’t know how to hand in hand, a hug, just wipe up the spark of love. (many people say she is super beautiful, I saw a few elder sister’s shadow, red hair and sharp eyes, the individual feels like widowed sister fortress) then the sea king found the trident, opens the mouth of the protagonist aura invincible, to get the trident, Crackling first killed a pile of sea clansman, and then became the king of sea clansman, Om was green, the throne was robbed. The sea king also said THAT I am the Lord of the sea, you are afraid not the head of the translation officer. The sea people are miserable, too. They can’t fight.

First of all, the advantages and special effects are sufficient! Special effects! Special effects! There is no digging. But I don’t know why! It’s just like normal. The hero and heroine love each other inexplicably. It feels like the women of Atlantis have one characteristic: they like men on land, not local ones. I feel sorry for the man in the sea. My fiancee ran off with a man on the land and gave birth to a son. The wife is not spoil, but after giving birth to children and daughters, take the sacrifice to the ugly degenerate sea monster?? When the daughter grew up, she fell in love with her half-brother again. The betrayal is inexplicable. Finally, the male master got his weapon not because of how brave he is, but because the male master can talk to the animals in the sea?? Then I wonder how animals in the sea can understand English. When I digress, I’ll come back. I’m getting a little bogged down. If you ask me, the Trident is a large signal diffuser, helping to spread orders.

Take a look at the comment times

For Aquaman, I only got 4 days’ data. Let’s see what time people are writing reviews. Most people write reviews after 10 PM

def analysis4(a):
    data = clean_data()
    # Fetch time
    # Add hour
    data["hour"] = data["startTime"].dt.hour
    data["startTime"] = data["startTime"].dt.date
    need_date = data[["startTime"."hour"]]
    def get_hour_size(data):
        hour_data = data.groupby(by="hour") ["hour"].size().reset_index(name="count")
        return hour_data
    data = need_date.groupby(by="startTime").apply(get_hour_size)

    data_reshape = data.pivot_table(index="startTime",columns="hour",values="count")

    bar = Bar("Time Share review Analysis",width =1200,height=600,title_pos ="center")
    data_reshape.fillna(0,inplace=True)
    print(data_reshape)
    for index,row in data_reshape.T.iterrows():
        print(data_reshape.index)
        v1 = list(row.values)

        bar.add(str(index)+"When",row.index,v1,is_legend_show=True,legend_pos="80%",legend_text_size=8)

    bar.render("html/1.html")
Copy the code

Fans distribution

# Process placename data to solve the problem where placename cannot be found in coordinate file
def handle(cities):
    Get all place names in the coordinate file
    data = None
    with open(
            'city_coordinates. Json file address ',
            mode='r', encoding='utf-8') as f:
        data = json.loads(f.read())  Convert STR to JSON

    # Loop judgment processing
    data_new = data.copy()  # Copy all placename data
    for city in set(cities):  # Use set to deduplicate
        # Process data with empty place names
        if city == ' ':
            while city in cities:
                cities.remove(city)
        count = 0
        for k in data.keys():
            count += 1
            if k == city:
                break
            if k.startswith(city):
                # print(k, city)
                data_new[city] = data[k]
                break
            if k.startswith(city[0:- 1]) and len(city) >= 3:
                data_new[city] = data[k]
                break
        # Deals with non-existent place names
        if count == len(data):
            while city in cities:
                cities.remove(city)

    Write overwrite coordinate file
    with open(
            'city_coordinates. Json file address ',
            mode='w', encoding='utf-8') as f:
        f.write(json.dumps(data_new, ensure_ascii=False))  Convert json to STR


def analysis6(a):


    data = clean_data()
    cities = list(data[~data["cityName"].isnull()]["cityName"].values)
    handle(cities)


    style = Style(
        title_color='#fff',
        title_pos='center',
        width=1200,
        height=600,
        background_color='#404a59'
    )


    new_cities = Counter(cities).most_common()

    geo = Geo("Aquaman fan distribution."."Data source: CSDN- Dream Eraser",**style.init_style)
    attr, value = geo.cast(new_cities)
    geo.add(' ', attr, value, visual_range=[0.3500],visual_text_color='#fff', symbol_size=15,is_visualmap=True, is_piecewise=True, visual_split_number=10)
    geo.render('Fan location distribution -GEO. HTML')
Copy the code

Word cloud

import jieba.analyse
def analysis7(a):
    data = clean_data()
    contents = list(data["content"].values)
    try:
        jieba.analyse.set_stop_words('stopwords.txt')
        tags = jieba.analyse.extract_tags(str(contents), topK=100, withWeight=True)
        name = []
        value = []
        for v, n in tags:
            # Weights are decimals, multiplied by ten thousand for rounding
            name.append( v)
            value.append( int(n * 10000))
        wordcloud = WordCloud(width=1300, height=620)
        wordcloud.add("", name, value, word_size_range=[20.100])
        wordcloud.render()
    except:
        print("Error")
Copy the code

All praise ah, good effects, good plot, no urine point, DC, Sea King, Wan Ziren, heroine. The cinema opens this weekend.