Writing in the front
Today I’m going to check out the hot spot, and I’m going to use Python and some data analysis to prove that Aquaman is good.
“Sea king” take you back to a film “dragon training” transformers “Star Wars” starship troopers “ender’s game” iron soldiers “alien may also draw lessons from the opponent’s” iron man “and” black panther “plot, and a little to do with the taste of the big Chinese flowering crabapple, deserve to go up a diu diu James wan type horror film, the outstanding commercial, I think it’s DC of the year. Wan Ziren, an excellent synthesizer.
It opened four days and made 740 million yuan
Before doing data analysis, we need to clean the data and process the data to the best
When you read the data, when you read the data, remember, header=None and then add names
# Read data
def get_data(a):
df = pd.read_csv("haiwang.csv",sep=",",header=None,names=["nickName"."cityName"."content"."approve"."reply"."startTime"."avatarurl"."score"],encoding="utf-8")
return df
Copy the code
Data cleaning
- To check whether the data is duplicate, use
drop_duplicates
Delete the data - After deleting the data, you need to reset the index
reset_index
- The processing time field is
datetime
type - Add a field for
content_length
Check the number of user reviews
# Clean data
def clean_data(a):
df = get_data()
has_copy = any(df.duplicated())
data_duplicated = df.duplicated().value_counts()
# Print (data_duplicated) # Check how much data is duplicated
data = df.drop_duplicates(keep="first") # Delete duplicate values
data = data.reset_index(drop=True) # Reset index
data["startTime"] = pd.to_datetime(data["startTime"])
data["content_length"] = data["content"].apply(len)
Print (data.isnull().any()
# print(data[data.isnull().values == True])
# print(data[data.nickName=="."])
return data
Copy the code
View basic data
View all data
The highest number of likes is 2,783, which is very high, but the average is only 0.25, so people on The cat’s eye are not very happy with it. The highest number of likes is only 43, which is very low
Take a look at some of the most liked statistics
The Phantom XL got the most likes. Check out his comments. Well, the writing is really good ~, praised the director, this place appeared a little oversight, I saw the cat’s eye point praise highest, I did not capture the data, should be to miss out, oversight!
Thumb up ranking
Reply ranking
# View basic data
def analysis1(a):
data = clean_data()
print(data.describe())
# to delete.
# need_delete = data[data["nickName"]=="."]
data = data[~(data['nickName'] = =".")]
# data = data[~data['nickName'].isin(["."])]
# data.drop(need_delete,axis=1,inplace=True)
print(data["nickName"].describe())
print(data["cityName"].describe())
Copy the code
NickName description check the description for nickName. We need to filter out the data, but the most watched city here is Beijing
Data = data[~(data[‘nickName’]==”.”)]
Count 57838 unique 55934 Top qzuser freq 57 Name: nickName, dtype: object
The old iron really sent a lot of alas, a strong kua this movie, O(∩_∩)O haha ~
See the score
5 points ahead, we’ve got tickets for the weekend and we’re ready to go
The implementation of the chart was using Pychats, the official documentation is in
Pyecharts.org/#/zh-cn/pre…
You can go to the documentation to see the detailed parameter Settings
# Analyze the score situation
def analysis2(a):
data = clean_data()
grouped = data.groupby(by="score") ["nickName"].size()
grouped = grouped.sort_values(ascending=False)
index = grouped.index
values = grouped.values
# histogram
bar = Bar("Bar chart",title_pos="left",width=240)
bar.add("",index,values,is_label_show=True,is_legend_show=True,mark_line=["min"."max"])
# Pie chart
pie = Pie("Pie",title_pos="right",width=240)
pie.add("",index,values,radius=[45.65],center=[70.50],is_label_show=True,legend_pos="90%",legend_orient= "vertical")
grid = Grid(page_title="Aquaman score details",width=1200,height=500)
grid.add(bar,grid_right="50%")
grid.add(pie, grid_left="70%")
grid.render("html/score.html")
print(data)
print(data[data["score"] = =0])
Copy the code
See the comments
def analysis3(a):
data = clean_data()
sort_data = data.sort_values(by="content_length",ascending=False)
print(sort_data.head(10) ["content"])
Copy the code
Excerpt a few, can have a look, by the way I can also learn the language
The story and special effects are excellent in science fiction. The story continues with Poseidon’s trident of black iron being unrolled, and the story of The sea king in this film makes power a legend, he has the ability of the Atlantean king that no other Atlantean has. Unlike many other heroes in the comics, Aquaman is not disgusted with killing and even has a thirst for blood, which makes his appearance and personality difficult to like. Wan adapted Aquaman’s character to a certain extent, weakening his cruelty and highlighting his inferiority. The whole “Aquaman” appears to be a battle for the throne, but it is actually a growing history of Arthur overcoming inferiority, and the trident is a very important turning point in his character.
“James wan really badly in the lens of the movie is really a kung fu, and sound effects on timely ringing sound combination of pictures, in the film are points, the direction of the films to see the growth, is a hero a hero of the way also has a lively warmth, adept at pat horror thriller genre, he can read horror on some lens processing, With best terrorist elements firmly grasp the audience’s attention, but I can’t destroy the movie nudges integral atmosphere, the degree of master is nice, DC this time please James wan fencing is doing the right thing, and effects of the underwater world is to force, on the story there are several points of the design is very spiritual, I think than venom good-looking, Because venom in front of the murder of the thriller picture scared me, of course, the film is not without shortcomings, such as a few places I think the pace of the process is too fast, plot routines are easy to guess and so on, but after all looking at the whole film, I give four words – – the flaws cannot outweigh the defects.”
Effects can also, but the plot really finely, in terms of my trough point, full resulted in sea net’s mother died because of giving birth to Neptune, didn’t also the sea king sent someone to kill him in childhood, om because people ambush land after they decided to start a war, it is possible that the om plot, but the bottom of the sea it is true that people were all destroyed) results the eyebrow, As om childhood friends betray om, went to the sea king, also don’t know how to hand in hand, a hug, just wipe up the spark of love. (many people say she is super beautiful, I saw a few elder sister’s shadow, red hair and sharp eyes, the individual feels like widowed sister fortress) then the sea king found the trident, opens the mouth of the protagonist aura invincible, to get the trident, Crackling first killed a pile of sea clansman, and then became the king of sea clansman, Om was green, the throne was robbed. The sea king also said THAT I am the Lord of the sea, you are afraid not the head of the translation officer. The sea people are miserable, too. They can’t fight.
First of all, the advantages and special effects are sufficient! Special effects! Special effects! There is no digging. But I don’t know why! It’s just like normal. The hero and heroine love each other inexplicably. It feels like the women of Atlantis have one characteristic: they like men on land, not local ones. I feel sorry for the man in the sea. My fiancee ran off with a man on the land and gave birth to a son. The wife is not spoil, but after giving birth to children and daughters, take the sacrifice to the ugly degenerate sea monster?? When the daughter grew up, she fell in love with her half-brother again. The betrayal is inexplicable. Finally, the male master got his weapon not because of how brave he is, but because the male master can talk to the animals in the sea?? Then I wonder how animals in the sea can understand English. When I digress, I’ll come back. I’m getting a little bogged down. If you ask me, the Trident is a large signal diffuser, helping to spread orders.
Take a look at the comment times
For Aquaman, I only got 4 days’ data. Let’s see what time people are writing reviews. Most people write reviews after 10 PM
def analysis4(a):
data = clean_data()
# Fetch time
# Add hour
data["hour"] = data["startTime"].dt.hour
data["startTime"] = data["startTime"].dt.date
need_date = data[["startTime"."hour"]]
def get_hour_size(data):
hour_data = data.groupby(by="hour") ["hour"].size().reset_index(name="count")
return hour_data
data = need_date.groupby(by="startTime").apply(get_hour_size)
data_reshape = data.pivot_table(index="startTime",columns="hour",values="count")
bar = Bar("Time Share review Analysis",width =1200,height=600,title_pos ="center")
data_reshape.fillna(0,inplace=True)
print(data_reshape)
for index,row in data_reshape.T.iterrows():
print(data_reshape.index)
v1 = list(row.values)
bar.add(str(index)+"When",row.index,v1,is_legend_show=True,legend_pos="80%",legend_text_size=8)
bar.render("html/1.html")
Copy the code
Fans distribution
# Process placename data to solve the problem where placename cannot be found in coordinate file
def handle(cities):
Get all place names in the coordinate file
data = None
with open(
'city_coordinates. Json file address ',
mode='r', encoding='utf-8') as f:
data = json.loads(f.read()) Convert STR to JSON
# Loop judgment processing
data_new = data.copy() # Copy all placename data
for city in set(cities): # Use set to deduplicate
# Process data with empty place names
if city == ' ':
while city in cities:
cities.remove(city)
count = 0
for k in data.keys():
count += 1
if k == city:
break
if k.startswith(city):
# print(k, city)
data_new[city] = data[k]
break
if k.startswith(city[0:- 1]) and len(city) >= 3:
data_new[city] = data[k]
break
# Deals with non-existent place names
if count == len(data):
while city in cities:
cities.remove(city)
Write overwrite coordinate file
with open(
'city_coordinates. Json file address ',
mode='w', encoding='utf-8') as f:
f.write(json.dumps(data_new, ensure_ascii=False)) Convert json to STR
def analysis6(a):
data = clean_data()
cities = list(data[~data["cityName"].isnull()]["cityName"].values)
handle(cities)
style = Style(
title_color='#fff',
title_pos='center',
width=1200,
height=600,
background_color='#404a59'
)
new_cities = Counter(cities).most_common()
geo = Geo("Aquaman fan distribution."."Data source: CSDN- Dream Eraser",**style.init_style)
attr, value = geo.cast(new_cities)
geo.add(' ', attr, value, visual_range=[0.3500],visual_text_color='#fff', symbol_size=15,is_visualmap=True, is_piecewise=True, visual_split_number=10)
geo.render('Fan location distribution -GEO. HTML')
Copy the code
Word cloud
import jieba.analyse
def analysis7(a):
data = clean_data()
contents = list(data["content"].values)
try:
jieba.analyse.set_stop_words('stopwords.txt')
tags = jieba.analyse.extract_tags(str(contents), topK=100, withWeight=True)
name = []
value = []
for v, n in tags:
# Weights are decimals, multiplied by ten thousand for rounding
name.append( v)
value.append( int(n * 10000))
wordcloud = WordCloud(width=1300, height=620)
wordcloud.add("", name, value, word_size_range=[20.100])
wordcloud.render()
except:
print("Error")
Copy the code
All praise ah, good effects, good plot, no urine point, DC, Sea King, Wan Ziren, heroine. The cinema opens this weekend.